<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Amit Kayal</title>
    <description>The latest articles on DEV Community by Amit Kayal (@amitkayal).</description>
    <link>https://dev.to/amitkayal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F500645%2Fe0c703c3-855c-4fbd-a1c0-b546a60c022e.png</url>
      <title>DEV Community: Amit Kayal</title>
      <link>https://dev.to/amitkayal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/amitkayal"/>
    <language>en</language>
    <item>
      <title>Hosting MCP Gateway Registry on AWS ECS: A Practical Blueprint for Enterprise Agentic AI Systems</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Sun, 24 May 2026 09:13:03 +0000</pubDate>
      <link>https://dev.to/amitkayal/hosting-mcp-gateway-registry-on-aws-ecs-a-practical-blueprint-for-enterprise-agentic-ai-systems-18a4</link>
      <guid>https://dev.to/amitkayal/hosting-mcp-gateway-registry-on-aws-ecs-a-practical-blueprint-for-enterprise-agentic-ai-systems-18a4</guid>
      <description>&lt;h1&gt;
  
  
  Hosting MCP Gateway Registry on AWS ECS: A Practical Blueprint for Enterprise Agentic AI Systems
&lt;/h1&gt;

&lt;p&gt;AI agents are no longer just demo applications that answer questions.&lt;/p&gt;

&lt;p&gt;They are slowly becoming systems that can take action: search customer records, update opportunities, generate quotes, create tickets, check inventory, read contracts, trigger workflows, and interact with business applications.&lt;/p&gt;

&lt;p&gt;That is where the real enterprise problem begins.&lt;/p&gt;

&lt;p&gt;When an AI agent only chats, the risk is limited. But when an agent starts using tools, APIs, and enterprise systems, we need a much stronger operating model. We need to know what the agent can access, who approved that access, what data it can touch, and how we can monitor every action.&lt;/p&gt;

&lt;p&gt;This is exactly where an &lt;strong&gt;MCP Gateway and Registry&lt;/strong&gt; becomes important.&lt;/p&gt;

&lt;p&gt;The MCP Gateway Registry gives us a central place to register MCP servers, discover available tools, manage authentication, control access, and observe how agents interact with enterprise capabilities.&lt;/p&gt;

&lt;p&gt;In this blog, I will walk through how we can host an MCP Gateway Registry on AWS using ECS Fargate, based on the Terraform AWS ECS deployment model from the MCP Gateway Registry project. This blog is based on the repo &lt;a href="https://github.com/agentic-community/mcp-gateway-registry/tree/main" rel="noopener noreferrer"&gt;https://github.com/agentic-community/mcp-gateway-registry/tree/main&lt;/a&gt; and all credit goes to repo contributors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Problem Matters
&lt;/h2&gt;

&lt;p&gt;In early AI agent projects, the architecture usually starts simple.&lt;/p&gt;

&lt;p&gt;One agent connects to one or two tools.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sales Agent
   |
   |-- Salesforce MCP Server
   |-- Knowledge Base MCP Server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works well for a proof of concept.&lt;/p&gt;

&lt;p&gt;But after some time, more teams start building agents.&lt;/p&gt;

&lt;p&gt;The sales team wants Salesforce and quote tools.&lt;br&gt;
The support team wants ticketing and knowledge base tools.&lt;br&gt;
The finance team wants billing and contract tools.&lt;br&gt;
The delivery team wants Jira, project reports, and document search tools.&lt;br&gt;
The leadership team wants reporting and analytics agents.&lt;/p&gt;

&lt;p&gt;Very quickly, the environment starts looking like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent 1 ---&amp;gt; MCP Server A
Agent 1 ---&amp;gt; MCP Server B
Agent 2 ---&amp;gt; MCP Server A
Agent 2 ---&amp;gt; MCP Server C
Agent 3 ---&amp;gt; MCP Server D
Agent 4 ---&amp;gt; MCP Server B
Agent 5 ---&amp;gt; MCP Server E
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this stage, the issue is no longer just technical integration.&lt;/p&gt;

&lt;p&gt;The real problems are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Who owns each MCP server?
Which agent is allowed to use which server?
What permissions does each tool have?
How do we prevent duplicate MCP servers?
How do we audit tool usage?
How do we onboard new tools safely?
How do we remove old or risky tools?
How do we monitor failures?
How do we stop agents from accessing sensitive systems without approval?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we do not solve this early, the MCP layer can become another uncontrolled integration layer.&lt;/p&gt;

&lt;p&gt;And in enterprise systems, uncontrolled integration always becomes a risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  What an MCP Gateway Registry Actually Does
&lt;/h2&gt;

&lt;p&gt;An MCP Gateway Registry acts as a control plane between AI agents and MCP servers.&lt;/p&gt;

&lt;p&gt;Instead of letting every agent directly connect to every MCP server, we introduce a managed gateway and registry layer.&lt;/p&gt;

&lt;p&gt;The architecture becomes cleaner:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI Agents / Developers / Applications
              |
              v
      MCP Gateway and Registry
              |
              v
        Approved MCP Servers
              |
              v
      Enterprise Applications
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us a much better operating model.&lt;/p&gt;

&lt;p&gt;The registry helps maintain information about available MCP servers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Server name
Owner
Description
Capabilities
Available tools
Security scopes
Environment
Version
Health status
Approval status
Discovery metadata
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The gateway helps control and route access:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Authentication
Authorization
Tool discovery
Request routing
Policy enforcement
Logging
Monitoring
Access control
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is important because enterprise agents should not randomly discover and use tools. They should use approved tools with approved scopes through a governed access path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Hosting This on AWS ECS Makes Sense
&lt;/h2&gt;

&lt;p&gt;There are multiple ways to host an MCP Gateway Registry.&lt;/p&gt;

&lt;p&gt;You can run it on virtual machines.&lt;br&gt;
You can deploy it on Kubernetes.&lt;br&gt;
You can run it on ECS.&lt;br&gt;
You can even start with a simple Docker Compose deployment for local testing.&lt;/p&gt;

&lt;p&gt;But for an enterprise-grade AWS deployment, &lt;strong&gt;ECS Fargate is a very practical option&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It gives us a managed container runtime without the operational overhead of managing EC2 worker nodes or a full Kubernetes control plane.&lt;/p&gt;

&lt;p&gt;For this type of gateway, ECS Fargate gives a good balance between simplicity and production readiness.&lt;/p&gt;

&lt;p&gt;Key benefits include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;No EC2 server management
Container-based deployment
Built-in integration with IAM
Easy logging through CloudWatch
Service-level health checks
Integration with Application Load Balancer
Auto-scaling support
Good fit for Terraform automation
Lower operational complexity than Kubernetes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In my view, unless an organization already has a mature EKS platform and Kubernetes operating model, ECS Fargate is a better first choice for hosting this kind of control-plane service.&lt;/p&gt;

&lt;p&gt;Kubernetes gives more flexibility, but it also adds more operational responsibility. For many teams, that is not needed on day one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Target AWS Architecture
&lt;/h2&gt;

&lt;p&gt;A production-style AWS architecture for MCP Gateway Registry can look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Users / Agents / Developers
          |
          v
Route 53 Custom Domain
          |
          v
CloudFront
          |
          v
AWS WAF
          |
          v
Application Load Balancer
          |
          v
ECS Fargate Services
   |          |           |
Registry   Auth Server   Keycloak
   |          |           |
   |          |           v
   |          |      Aurora PostgreSQL
   |
   v
Amazon DocumentDB

Supporting Services:
- AWS Secrets Manager
- CloudWatch Logs
- CloudWatch Alarms
- ECR
- IAM
- ACM
- Optional Prometheus and Grafana
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not just about running containers.&lt;/p&gt;

&lt;p&gt;This architecture gives us:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Secure external access
Managed container hosting
Central authentication
Registry persistence
Secret management
Observability
Certificate management
Custom domain support
Infrastructure automation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the difference between a demo deployment and an enterprise deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core AWS Components
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Amazon ECS Fargate
&lt;/h3&gt;

&lt;p&gt;ECS Fargate runs the containerized services.&lt;/p&gt;

&lt;p&gt;The deployment can include multiple services such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MCP Gateway Registry
Authentication server
Keycloak
MCP gateway service
Sample MCP servers
Sample agents
Observability components
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each service runs as an ECS task.&lt;/p&gt;

&lt;p&gt;In production, I would recommend separating these into clear services rather than bundling too much into one container. This gives better control over scaling, logging, deployments, and troubleshooting.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Registry service       --&amp;gt; Handles MCP server metadata and discovery
Auth service           --&amp;gt; Handles authentication flow
Keycloak service       --&amp;gt; Identity and access management
Sample MCP services    --&amp;gt; Optional, mostly for demo or validation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production, sample agents and sample MCP servers should be disabled or deployed only in a non-production environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Application Load Balancer
&lt;/h3&gt;

&lt;p&gt;The Application Load Balancer exposes the ECS services through HTTPS endpoints.&lt;/p&gt;

&lt;p&gt;It performs routing to the correct ECS target group.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/registry  --&amp;gt; Registry service
/auth      --&amp;gt; Auth service
/keycloak  --&amp;gt; Keycloak service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or, in a cleaner production model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;registry.company.com  --&amp;gt; Registry service
auth.company.com      --&amp;gt; Auth service
kc.company.com        --&amp;gt; Keycloak
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This domain-based separation is better for enterprise usage because it improves clarity, security boundaries, and operational ownership.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. CloudFront
&lt;/h3&gt;

&lt;p&gt;CloudFront can sit in front of the ALB.&lt;/p&gt;

&lt;p&gt;For production, this is useful because it gives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Global edge access
Better TLS handling
Additional protection layer
Integration point for WAF
Cleaner public access pattern
Potential performance benefits
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For internal-only deployments, CloudFront may not always be required. But if the registry is accessed by distributed teams, external developers, or cloud-hosted agents, CloudFront becomes useful.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. AWS WAF
&lt;/h3&gt;

&lt;p&gt;I would strongly recommend using AWS WAF in front of internet-facing endpoints.&lt;/p&gt;

&lt;p&gt;The MCP gateway is a sensitive entry point because it controls access to tools. So it should not be exposed casually.&lt;/p&gt;

&lt;p&gt;Useful WAF controls include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Rate limiting
AWS managed rule groups
IP restrictions
Bot protection
Geo restrictions if required
SQL injection protection
Cross-site scripting protection
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is especially important if agents, developers, or external systems access the gateway over the internet.&lt;/p&gt;




&lt;h3&gt;
  
  
  5. Route 53 and ACM
&lt;/h3&gt;

&lt;p&gt;Route 53 manages DNS records.&lt;/p&gt;

&lt;p&gt;ACM provides SSL/TLS certificates.&lt;/p&gt;

&lt;p&gt;This gives us clean URLs such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;registry.company.com
auth.company.com
kc.company.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For enterprise adoption, this matters more than people think. Clean domain names make the platform feel like a real internal product rather than a temporary engineering setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Amazon Aurora PostgreSQL
&lt;/h3&gt;

&lt;p&gt;Aurora PostgreSQL is used for Keycloak data.&lt;/p&gt;

&lt;p&gt;Keycloak needs a relational database to store identity-related information, including:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Users
Realms
Clients
Roles
Sessions
Identity provider configuration
Authentication settings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using Aurora gives better reliability than running a database inside a container.&lt;/p&gt;

&lt;p&gt;For production, I would avoid containerized databases for this type of platform. Identity is too important to treat casually.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Amazon DocumentDB
&lt;/h3&gt;

&lt;p&gt;DocumentDB is used by the registry layer.&lt;/p&gt;

&lt;p&gt;This is where MCP server and agent metadata can be stored.&lt;/p&gt;

&lt;p&gt;Example records may include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MCP server name
MCP server URL
Tool list
Tool descriptions
Security scopes
Server health
Owner team
Environment
Version
Approval state
Risk classification
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Over time, this registry becomes the enterprise catalog for agent-accessible capabilities.&lt;/p&gt;

&lt;p&gt;This is very valuable.&lt;/p&gt;

&lt;p&gt;It allows teams to search and discover what tools already exist instead of rebuilding the same MCP servers again and again.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. AWS Secrets Manager
&lt;/h3&gt;

&lt;p&gt;Secrets Manager should be used for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Database credentials
Keycloak admin credentials
JWT secrets
Client secrets
Service credentials
API keys
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No production credential should be hardcoded inside Terraform files, Docker images, or environment files stored in Git.&lt;/p&gt;

&lt;p&gt;This is basic, but it is often missed in early AI platform projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. CloudWatch Logs and Alarms
&lt;/h3&gt;

&lt;p&gt;Every ECS service should write logs to CloudWatch.&lt;/p&gt;

&lt;p&gt;At minimum, we should monitor:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Container startup failures
Authentication failures
Registry API errors
Tool discovery failures
Database connection errors
ECS task restarts
ALB 4xx errors
ALB 5xx errors
High latency
Memory pressure
CPU pressure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But for an MCP gateway, infrastructure logs are not enough.&lt;/p&gt;

&lt;p&gt;We also need agent activity logs.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Which agent requested tool discovery?
Which MCP server was selected?
Which tool was invoked?
Which scope was used?
Was the request allowed or denied?
What was the response status?
How long did the tool call take?
Was sensitive data involved?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where the MCP gateway starts becoming a governance system, not just a routing layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment Options
&lt;/h2&gt;

&lt;p&gt;The Terraform setup supports different deployment modes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1: CloudFront Only
&lt;/h3&gt;

&lt;p&gt;This is useful for a quick POC.&lt;/p&gt;

&lt;p&gt;You do not need a custom domain. You get a CloudFront-generated URL.&lt;/p&gt;

&lt;p&gt;This is suitable for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Internal demo
Engineering validation
Architecture exploration
Short-term sandbox
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not my preferred option for production, but it is a good way to start quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: Custom Domain Only
&lt;/h3&gt;

&lt;p&gt;In this model, Route 53 and ACM are used, but CloudFront may not be enabled.&lt;/p&gt;

&lt;p&gt;You get URLs like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;registry.company.com
kc.company.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is better than a random generated URL, but it may not give enough edge protection if exposed publicly.&lt;/p&gt;

&lt;p&gt;This can work well for private/internal deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 3: CloudFront + Custom Domain
&lt;/h3&gt;

&lt;p&gt;This is the best production model.&lt;/p&gt;

&lt;p&gt;Traffic flows like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User / Agent
    |
    v
Custom Domain
    |
    v
CloudFront
    |
    v
WAF
    |
    v
Application Load Balancer
    |
    v
ECS Fargate Service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives a stronger production posture.&lt;/p&gt;

&lt;p&gt;My recommendation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Use CloudFront + Route 53 + WAF for production.
Use CloudFront-only for demo.
Use custom domain-only only for controlled internal environments.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Practical Deployment Flow
&lt;/h2&gt;

&lt;p&gt;The deployment flow can be divided into clear stages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 1: Prepare AWS Account
&lt;/h3&gt;

&lt;p&gt;Before starting, we should decide:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AWS region
VPC strategy
Domain name
Environment name
Access model
CIDR restrictions
Secrets strategy
Terraform state backend
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production, I would not deploy this into a random shared AWS account.&lt;/p&gt;

&lt;p&gt;Better model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Separate AWS account for dev
Separate AWS account for staging
Separate AWS account for production
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At minimum, use separate environments and separate Terraform state.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 2: Build and Push Images to ECR
&lt;/h3&gt;

&lt;p&gt;The services need to be built as Docker images and pushed to Amazon ECR.&lt;/p&gt;

&lt;p&gt;A simplified flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-east-1
make build-push
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is a set of ECR image URIs.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;123456789012.dkr.ecr.us-east-1.amazonaws.com/mcp-gateway-registry:v1.0.0
123456789012.dkr.ecr.us-east-1.amazonaws.com/mcp-gateway-auth:v1.0.0
123456789012.dkr.ecr.us-east-1.amazonaws.com/mcp-gateway:v1.0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production, avoid using &lt;code&gt;latest&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Use versioned immutable tags.&lt;/p&gt;

&lt;p&gt;Bad:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mcp-gateway-registry:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mcp-gateway-registry:v1.0.3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Best:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mcp-gateway-registry:v1.0.3-build-20260524
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This helps with rollback, audit, and release traceability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 3: Configure Terraform Variables
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;terraform.tfvars&lt;/code&gt; file is where we configure the deployment.&lt;/p&gt;

&lt;p&gt;Important values include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;aws_region&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;

&lt;span class="nx"&gt;enable_cloudfront&lt;/span&gt;  &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="nx"&gt;enable_route53_dns&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="nx"&gt;base_domain&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"company.com"&lt;/span&gt;

&lt;span class="nx"&gt;session_cookie_domain&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;".company.com"&lt;/span&gt;
&lt;span class="nx"&gt;session_cookie_secure&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="nx"&gt;ingress_cidr_blocks&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="s2"&gt;"YOUR_OFFICE_IP/32"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s2"&gt;"YOUR_VPN_IP/32"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Database and admin passwords should be handled carefully.&lt;/p&gt;

&lt;p&gt;In a strong production model, these should come from a secure secret injection process rather than being manually placed in local files.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 4: Initialize Terraform
&lt;/h3&gt;

&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform init &lt;span class="nt"&gt;-upgrade&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production, Terraform state should be stored remotely.&lt;/p&gt;

&lt;p&gt;Recommended backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;S3 bucket for state
DynamoDB table for locking
KMS encryption
Restricted IAM access
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do not use local state for production.&lt;/p&gt;

&lt;p&gt;Local state is acceptable for learning, but not for enterprise infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 5: Create Certificates First
&lt;/h3&gt;

&lt;p&gt;ACM certificates often require DNS validation.&lt;/p&gt;

&lt;p&gt;That is why the deployment may need a first targeted apply for certificates.&lt;/p&gt;

&lt;p&gt;Conceptually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform apply &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;aws_acm_certificate.keycloak &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;aws_acm_certificate.registry &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;aws_acm_certificate_validation.keycloak &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;aws_acm_certificate_validation.registry
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows certificates to be created and validated before the rest of the infrastructure depends on them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 6: Deploy Full Infrastructure
&lt;/h3&gt;

&lt;p&gt;After certificate validation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This deploys the full stack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Networking
Security groups
ECS cluster
ECS services
ALB
Target groups
CloudFront
Route 53 records
Aurora PostgreSQL
DocumentDB
Secrets
CloudWatch logs
IAM roles
Optional observability stack
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, the infrastructure is created, but the application may still need initialization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 7: Run Post-Deployment Setup
&lt;/h3&gt;

&lt;p&gt;Post-deployment setup is very important.&lt;/p&gt;

&lt;p&gt;This step usually performs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Terraform output extraction
DNS validation
ECS service health checks
Keycloak realm setup
Client setup
Admin user setup
DocumentDB collection initialization
Registry indexes
Scope setup
Service restart
Endpoint validation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This step converts infrastructure into a usable platform.&lt;/p&gt;

&lt;p&gt;Without this, the containers may be running, but the gateway may not be fully ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the Gateway Should Be Used After Hosting
&lt;/h2&gt;

&lt;p&gt;Once deployed, teams can start registering MCP servers.&lt;/p&gt;

&lt;p&gt;A good MCP server registration should include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Server name
Business capability
Owner team
Technical owner
Environment
Base URL
Supported tools
Required scopes
Risk level
Data classification
Health check endpoint
Approval status
Version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Name: Salesforce Opportunity MCP Server
Owner: Sales Platform Team
Environment: Production
Tools:
- searchOpportunity
- updateOpportunityStage
- getAccountDetails
Scopes:
- salesforce.read
- salesforce.opportunity.update
Risk: High
Data: Customer and revenue data
Approval: Required
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This level of metadata is important.&lt;/p&gt;

&lt;p&gt;Without it, the registry becomes just another technical catalog. With it, the registry becomes a real enterprise control plane.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enterprise Governance Model
&lt;/h2&gt;

&lt;p&gt;For enterprise usage, I would define a clear lifecycle for MCP servers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Suggested MCP Server Lifecycle
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Draft
   |
Submitted for Review
   |
Security Review
   |
Approved for Dev
   |
Approved for Production
   |
Monitored
   |
Deprecated
   |
Retired
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every MCP server should have an owner.&lt;/p&gt;

&lt;p&gt;Every high-risk tool should have approval.&lt;/p&gt;

&lt;p&gt;Every production MCP server should have monitoring.&lt;/p&gt;

&lt;p&gt;Every deprecated server should have a retirement date.&lt;/p&gt;

&lt;p&gt;This may sound heavy, but it is necessary once agents start touching real systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Access Control Model
&lt;/h2&gt;

&lt;p&gt;The gateway should not allow all agents to use all MCP servers.&lt;/p&gt;

&lt;p&gt;That is a weak design.&lt;/p&gt;

&lt;p&gt;A better model is scope-based access.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent: Sales Copilot
Allowed scopes:
- salesforce.read
- quote.read
- product.search

Not allowed:
- discount.approve
- contract.delete
- customer.export
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Another example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent: Deal Desk Agent
Allowed scopes:
- quote.read
- quote.update
- discount.request
- contract.read

Requires approval:
- discount.approve
- final_quote.submit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is how we prevent agents from becoming over-permissioned.&lt;/p&gt;

&lt;p&gt;One of the biggest risks in agentic AI systems will be excessive tool permission. If we give one agent too many tools and too much authority, it becomes hard to control behavior and impact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability for Agentic Systems
&lt;/h2&gt;

&lt;p&gt;Traditional application monitoring is not enough here.&lt;/p&gt;

&lt;p&gt;We need both system observability and agent observability.&lt;/p&gt;

&lt;h3&gt;
  
  
  System Observability
&lt;/h3&gt;

&lt;p&gt;Track:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CPU
Memory
Container restarts
Task failures
ALB errors
Request latency
Database connections
Authentication errors
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Agent and Tool Observability
&lt;/h3&gt;

&lt;p&gt;Track:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent ID
User ID
Tool requested
MCP server used
Scope used
Decision outcome
Policy result
Execution latency
Failure reason
Data classification
External system touched
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, a useful audit log may look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sales-copilot"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"john@company.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcp_server"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"salesforce-opportunity-server"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"updateOpportunityStage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scope"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"salesforce.opportunity.update"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"decision"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"allowed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-24T10:15:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"latency_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;450&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"success"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This type of logging becomes extremely important when something goes wrong.&lt;/p&gt;

&lt;p&gt;If an agent updates the wrong opportunity or calls a pricing tool incorrectly, we should be able to reconstruct exactly what happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  CI/CD Model
&lt;/h2&gt;

&lt;p&gt;For production, deployment should not be manual.&lt;/p&gt;

&lt;p&gt;A good CI/CD pipeline should look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Developer raises PR
        |
Code review
        |
Build Docker images
        |
Run unit tests
        |
Run container security scan
        |
Push image to ECR
        |
Terraform plan
        |
Manual approval for production
        |
Terraform apply
        |
Run post-deployment setup
        |
Smoke test
        |
Notify platform team
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps the deployment controlled and auditable.&lt;/p&gt;

&lt;p&gt;For rollback, the team should be able to redeploy a previous image tag quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended Environment Strategy
&lt;/h2&gt;

&lt;p&gt;I would recommend at least three environments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Development
Staging
Production
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Development
&lt;/h3&gt;

&lt;p&gt;Used for engineering testing.&lt;/p&gt;

&lt;p&gt;Can have relaxed settings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sample MCP servers allowed
Lower database capacity
CloudFront-only mode acceptable
Limited monitoring
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Staging
&lt;/h3&gt;

&lt;p&gt;Used for pre-production validation.&lt;/p&gt;

&lt;p&gt;Should be close to production.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Custom domain
WAF enabled
Production-like IAM
Production-like secrets
Observability enabled
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Production
&lt;/h3&gt;

&lt;p&gt;Used for real enterprise agents.&lt;/p&gt;

&lt;p&gt;Should be hardened.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Separate AWS account
CloudFront + WAF
Private subnets
Strict ingress
Immutable images
Centralized logs
Audit trail
Backup enabled
Approval workflow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Production Hardening Checklist
&lt;/h2&gt;

&lt;p&gt;Before calling this production-ready, I would validate the following.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Remote Terraform state enabled
Terraform state encrypted
DynamoDB locking enabled
Separate AWS accounts or environments
Secrets stored in Secrets Manager
No secrets in Git
CloudFront enabled
WAF enabled
Ingress restricted
Keycloak admin access restricted
ECS tasks in private subnets
ALB security groups reviewed
Aurora backups enabled
DocumentDB backups enabled
CloudWatch alarms configured
Container image scanning enabled
Immutable image tags used
IAM least privilege applied
Audit logging enabled
MCP server ownership defined
Tool scopes defined
Production approval process defined
Runbook created
Rollback process tested
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The most common mistake is to stop after the Terraform deployment succeeds.&lt;/p&gt;

&lt;p&gt;That only means infrastructure exists.&lt;/p&gt;

&lt;p&gt;It does not mean the platform is secure, governed, observable, or ready for production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operational Runbook
&lt;/h2&gt;

&lt;p&gt;For a serious enterprise setup, the platform team should maintain a simple runbook.&lt;/p&gt;

&lt;p&gt;The runbook should answer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;How do we onboard a new MCP server?
How do we approve a production MCP server?
How do we revoke access?
How do we rotate secrets?
How do we check service health?
How do we debug registry failures?
How do we debug authentication failures?
How do we rollback a release?
How do we retire an old MCP server?
How do we investigate suspicious tool usage?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where platform maturity comes in.&lt;/p&gt;

&lt;p&gt;An MCP gateway is not a one-time deployment. It becomes part of the agentic AI platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Fits in an Enterprise Agent Architecture
&lt;/h2&gt;

&lt;p&gt;In a broader enterprise agentic AI architecture, the MCP Gateway Registry sits between orchestration and enterprise tools.&lt;/p&gt;

&lt;p&gt;A practical model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Interface
      |
      v
Agent Orchestrator
      |
      v
Policy / Guardrail Layer
      |
      v
MCP Gateway Registry
      |
      v
MCP Servers
      |
      v
Enterprise Systems
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The orchestrator decides what needs to be done.&lt;/p&gt;

&lt;p&gt;The policy layer checks whether the action is allowed.&lt;/p&gt;

&lt;p&gt;The MCP gateway provides controlled tool discovery and access.&lt;/p&gt;

&lt;p&gt;The MCP server performs the actual system interaction.&lt;/p&gt;

&lt;p&gt;This separation is important.&lt;/p&gt;

&lt;p&gt;Do not put all responsibilities into one big agent.&lt;/p&gt;

&lt;p&gt;That becomes hard to scale, hard to debug, and dangerous to govern.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Practical Recommendation
&lt;/h2&gt;

&lt;p&gt;For a real enterprise deployment, I would host the MCP Gateway Registry with this setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AWS ECS Fargate for services
CloudFront in front
AWS WAF enabled
Route 53 custom domains
ACM certificates
Application Load Balancer
Private subnets for ECS tasks
Aurora PostgreSQL for Keycloak
DocumentDB for registry metadata
Secrets Manager for credentials
CloudWatch for logs and alarms
Optional Grafana and Prometheus for deeper observability
S3 backend for Terraform state
DynamoDB for Terraform locking
CI/CD for image build and deployment
Immutable ECR image tags
Strict admin access
Scope-based authorization
Audit logs for all tool usage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a POC, I would keep it simple.&lt;/p&gt;

&lt;p&gt;For production, I would not compromise on security, logging, and access control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Lessons Learned
&lt;/h2&gt;

&lt;p&gt;The biggest lesson is this:&lt;/p&gt;

&lt;p&gt;Hosting the MCP Gateway Registry is not only an infrastructure activity. It is the beginning of an operating model for enterprise agents.&lt;/p&gt;

&lt;p&gt;If agents are going to use real tools, then organizations need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tool ownership
Tool approval
Tool discovery
Tool scopes
Tool observability
Tool lifecycle management
Tool risk classification
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, agentic AI systems may work technically but fail operationally.&lt;/p&gt;

&lt;p&gt;And in enterprises, operational failure is usually what blocks adoption.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;MCP is making tool integration more standard for AI agents. That is a very important shift.&lt;/p&gt;

&lt;p&gt;But standardization also creates scale.&lt;/p&gt;

&lt;p&gt;And once we scale the number of agents and tools, we need governance.&lt;/p&gt;

&lt;p&gt;That is why an MCP Gateway Registry should be treated as a core platform capability, not as a side component.&lt;/p&gt;

&lt;p&gt;It gives engineering teams a structured way to expose tools.&lt;br&gt;
It gives security teams a way to control access.&lt;br&gt;
It gives platform teams a way to monitor usage.&lt;br&gt;
It gives business teams more confidence that agents are not directly and blindly touching enterprise systems.&lt;/p&gt;

&lt;p&gt;In my view, this is one of the important building blocks for production-grade agentic AI systems.&lt;/p&gt;

&lt;p&gt;The future will not be one agent directly connected to many tools.&lt;/p&gt;

&lt;p&gt;The future will be governed agent ecosystems, where tools are registered, discoverable, monitored, secured, and lifecycle-managed through a central control plane.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>agents</category>
      <category>ecs</category>
      <category>mcp</category>
    </item>
    <item>
      <title>When One AI Agent Is Not Enough: A Practical Delegation Pattern for Enterprise Systems</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Sat, 23 May 2026 17:50:08 +0000</pubDate>
      <link>https://dev.to/amitkayal/when-one-ai-agent-is-not-enough-a-practical-delegation-pattern-for-enterprise-systems-16nb</link>
      <guid>https://dev.to/amitkayal/when-one-ai-agent-is-not-enough-a-practical-delegation-pattern-for-enterprise-systems-16nb</guid>
      <description>&lt;h1&gt;
  
  
  When One AI Agent Is Not Enough: A Practical Delegation Pattern for Enterprise Systems
&lt;/h1&gt;

&lt;p&gt;A lot of enterprise AI systems start the same way.&lt;/p&gt;

&lt;p&gt;One agent.&lt;br&gt;
One big prompt.&lt;br&gt;
A bunch of tools.&lt;br&gt;
A lot of hope.&lt;/p&gt;

&lt;p&gt;At first, it looks great. The agent can answer questions, call a few systems, maybe even complete a useful workflow. But once the use case gets more realistic, cracks start to show.&lt;/p&gt;

&lt;p&gt;The agent has to understand too much.&lt;br&gt;
It has to access too many systems.&lt;br&gt;
It has to make too many different kinds of decisions.&lt;br&gt;
And when something goes wrong, it is hard to tell where the problem actually is.&lt;/p&gt;

&lt;p&gt;That is usually the point where the issue stops being “prompt quality” and starts becoming “system design.”&lt;/p&gt;

&lt;p&gt;One pattern I’ve found especially useful is delegation across agents and subagents.&lt;/p&gt;

&lt;p&gt;Not because it sounds advanced.&lt;br&gt;
Because it is often the more practical way to build enterprise AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem with a single large agent
&lt;/h2&gt;

&lt;p&gt;There is an appealing simplicity in saying, “Let one agent handle the whole thing.”&lt;/p&gt;

&lt;p&gt;But enterprise workflows are rarely that clean.&lt;/p&gt;

&lt;p&gt;Take something simple on the surface, like a customer escalation.&lt;/p&gt;

&lt;p&gt;To handle it well, the system may need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pull ticket history&lt;/li&gt;
&lt;li&gt;understand product context&lt;/li&gt;
&lt;li&gt;check support policy&lt;/li&gt;
&lt;li&gt;review account state&lt;/li&gt;
&lt;li&gt;recommend next actions&lt;/li&gt;
&lt;li&gt;trigger an internal workflow&lt;/li&gt;
&lt;li&gt;draft a reply&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Yes, one agent can try to do all of that.&lt;/p&gt;

&lt;p&gt;But in practice, the more responsibilities you pile into one agent, the more fragile it becomes.&lt;/p&gt;

&lt;p&gt;You usually end up with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;too much context going into one step&lt;/li&gt;
&lt;li&gt;too many tools available to one component&lt;/li&gt;
&lt;li&gt;weaker predictability&lt;/li&gt;
&lt;li&gt;weaker governance&lt;/li&gt;
&lt;li&gt;and much harder debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system may still “work,” but it becomes difficult to trust.&lt;/p&gt;

&lt;h1&gt;
  
  
  A better pattern: one lead agent, a few focused subagents
&lt;/h1&gt;

&lt;p&gt;The cleaner pattern is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Primary agent -&amp;gt; specialist subagents -&amp;gt; final outcome&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;primary agent&lt;/strong&gt; owns the workflow.&lt;/p&gt;

&lt;p&gt;Its job is to understand the request, decide what needs to happen, delegate the right pieces of work, and then combine the results.&lt;/p&gt;

&lt;p&gt;The subagents each do one thing well.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a retrieval subagent gets the right context&lt;/li&gt;
&lt;li&gt;a policy subagent checks rules or entitlements&lt;/li&gt;
&lt;li&gt;an analysis subagent recommends next steps&lt;/li&gt;
&lt;li&gt;an execution subagent handles approved downstream actions&lt;/li&gt;
&lt;li&gt;a communication subagent drafts the final message&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a much healthier design than asking one broad agent to do everything in one pass.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this pattern works better
&lt;/h2&gt;

&lt;p&gt;The first reason is simple: &lt;strong&gt;focus&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A retrieval subagent can focus on retrieval.&lt;br&gt;
A policy subagent can focus on policy.&lt;br&gt;
An execution subagent can focus on action.&lt;/p&gt;

&lt;p&gt;You are not forcing one component to juggle too many responsibilities.&lt;/p&gt;

&lt;p&gt;The second reason is control.&lt;/p&gt;

&lt;p&gt;Different subagents can have different permissions, different tools, and different operating boundaries. That is much easier to govern in enterprise systems.&lt;/p&gt;

&lt;p&gt;The third reason is observability.&lt;/p&gt;

&lt;p&gt;If the outcome is wrong, you have a better shot at knowing where it went wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;bad retrieval&lt;/li&gt;
&lt;li&gt;wrong policy interpretation&lt;/li&gt;
&lt;li&gt;weak action selection&lt;/li&gt;
&lt;li&gt;poor response generation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a huge advantage once the system moves beyond demo stage.&lt;/p&gt;

&lt;h1&gt;
  
  
  What the primary agent should actually do
&lt;/h1&gt;

&lt;p&gt;One mistake I see is treating the primary agent like a simple router.&lt;/p&gt;

&lt;p&gt;That is not enough.&lt;/p&gt;

&lt;p&gt;The primary agent should behave more like a coordinator.&lt;/p&gt;

&lt;p&gt;It should:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;understand the incoming request&lt;/li&gt;
&lt;li&gt;decide what subtasks are needed&lt;/li&gt;
&lt;li&gt;choose the right subagents&lt;/li&gt;
&lt;li&gt;pass only the necessary context&lt;/li&gt;
&lt;li&gt;review what comes back&lt;/li&gt;
&lt;li&gt;and decide whether to continue, retry, escalate, or stop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, it owns the workflow logic.&lt;/p&gt;

&lt;p&gt;It should not blindly trust every subagent output.&lt;br&gt;
It should have judgment.&lt;/p&gt;

&lt;p&gt;That is what makes delegation useful rather than just decorative.&lt;/p&gt;

&lt;h1&gt;
  
  
  What makes a good subagent
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;A good subagent is narrow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is probably the single most important design rule.&lt;/p&gt;

&lt;p&gt;Each subagent should ideally have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one clear job&lt;/li&gt;
&lt;li&gt;limited tools&lt;/li&gt;
&lt;li&gt;limited context&lt;/li&gt;
&lt;li&gt;a defined output format&lt;/li&gt;
&lt;li&gt;clear boundaries on what it should not do&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If a subagent is doing retrieval, analysis, execution, and communication together, it is no longer a real specialist.&lt;/p&gt;

&lt;p&gt;It is just another general-purpose agent with a different label. And once you do that, the value of delegation starts disappearing.&lt;/p&gt;

&lt;h1&gt;
  
  
  A sharper example
&lt;/h1&gt;

&lt;p&gt;Let’s go back to the customer escalation example.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bad design
&lt;/h3&gt;

&lt;p&gt;One large agent receives the case and tries to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read the issue&lt;/li&gt;
&lt;li&gt;search past history&lt;/li&gt;
&lt;li&gt;check policy&lt;/li&gt;
&lt;li&gt;assess severity&lt;/li&gt;
&lt;li&gt;decide the next action&lt;/li&gt;
&lt;li&gt;update internal systems&lt;/li&gt;
&lt;li&gt;draft the reply&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This may work sometimes.&lt;/p&gt;

&lt;p&gt;But it is too much responsibility in one place.&lt;/p&gt;

&lt;h3&gt;
  
  
  Better design
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Primary agent&lt;/strong&gt;&lt;br&gt;
Owns the overall case flow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retrieval subagent&lt;/strong&gt;&lt;br&gt;
Gathers ticket history, account context, product details, and related documentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Policy subagent&lt;/strong&gt;&lt;br&gt;
Checks entitlement, SLA, escalation rules, and any support constraints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analysis subagent&lt;/strong&gt;&lt;br&gt;
Looks at the combined context and suggests the best next step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Execution subagent&lt;/strong&gt;&lt;br&gt;
Triggers the approved workflow, creates tasks, or updates systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Communication subagent&lt;/strong&gt;&lt;br&gt;
Drafts the customer-facing or internal message.&lt;/p&gt;

&lt;p&gt;Now the workflow is clearer.&lt;br&gt;
Each step is easier to test.&lt;br&gt;
And if the result is weak, you can usually tell why.&lt;/p&gt;

&lt;h1&gt;
  
  
  When delegation is worth it
&lt;/h1&gt;

&lt;p&gt;Not every use case needs this pattern.&lt;/p&gt;

&lt;p&gt;Sometimes one well-designed agent is enough.&lt;/p&gt;

&lt;p&gt;Delegation becomes useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the workflow crosses different domains&lt;/li&gt;
&lt;li&gt;different systems or permissions are involved&lt;/li&gt;
&lt;li&gt;some work can happen in parallel&lt;/li&gt;
&lt;li&gt;one agent is becoming overloaded&lt;/li&gt;
&lt;li&gt;governance starts getting messy&lt;/li&gt;
&lt;li&gt;you want better testing and failure isolation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the workflow is small and bounded, keep it simple.&lt;/p&gt;

&lt;p&gt;The point is not to add more agents for the sake of it.&lt;br&gt;
The point is to use delegation when specialization clearly improves the system.&lt;/p&gt;

&lt;h1&gt;
  
  
  Practical rules that help
&lt;/h1&gt;

&lt;h2&gt;
  
  
  1. Start with a small number of subagents
&lt;/h2&gt;

&lt;p&gt;Do not build a maze.&lt;/p&gt;

&lt;p&gt;Start with one primary agent and maybe two or three specialists. That is usually enough to prove whether the pattern is helping.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Keep context tight
&lt;/h2&gt;

&lt;p&gt;Do not pass everything to every agent.&lt;/p&gt;

&lt;p&gt;Each subagent should get only the context it actually needs. Too much context often makes outputs worse, not better.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Use structured outputs
&lt;/h2&gt;

&lt;p&gt;Subagents should return something predictable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a decision&lt;/li&gt;
&lt;li&gt;a label&lt;/li&gt;
&lt;li&gt;a ranked list&lt;/li&gt;
&lt;li&gt;a JSON object&lt;/li&gt;
&lt;li&gt;a recommendation plus confidence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not vague prose that another component has to guess at.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Design low-confidence paths
&lt;/h2&gt;

&lt;p&gt;If a subagent is not confident, that should trigger something explicit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retry&lt;/li&gt;
&lt;li&gt;clarification&lt;/li&gt;
&lt;li&gt;fallback logic&lt;/li&gt;
&lt;li&gt;human review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not let weak outputs quietly flow into the rest of the chain.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Log the handoffs
&lt;/h2&gt;

&lt;p&gt;You need to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what task was delegated&lt;/li&gt;
&lt;li&gt;what context was passed&lt;/li&gt;
&lt;li&gt;what came back&lt;/li&gt;
&lt;li&gt;what happened next&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that, debugging becomes painful very quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Control tools by role
&lt;/h2&gt;

&lt;p&gt;A retrieval subagent should not have broad execution rights.&lt;br&gt;
An execution subagent should not have unnecessary access to everything.&lt;br&gt;
Different responsibilities should have different permissions.&lt;/p&gt;

&lt;p&gt;That is one of the easiest ways to keep governance strong.&lt;/p&gt;

&lt;h1&gt;
  
  
  Common mistakes
&lt;/h1&gt;

&lt;p&gt;A few patterns show up again and again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Too many agents too early&lt;/strong&gt;&lt;br&gt;
More moving parts do not automatically make the design better.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Subagents with overlapping jobs&lt;/strong&gt;&lt;br&gt;
If roles are fuzzy, delegation becomes noisy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Passing all context everywhere&lt;/strong&gt;&lt;br&gt;
That weakens specialization fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No fallback design&lt;/strong&gt;&lt;br&gt;
One failed subtask should not silently break the whole workflow.&lt;/p&gt;

&lt;p&gt;This is an architecture pattern.&lt;/p&gt;

&lt;h1&gt;
  
  
  Final thought
&lt;/h1&gt;

&lt;p&gt;Delegation across agents and subagents is one of the more practical patterns in enterprise AI.&lt;/p&gt;

&lt;p&gt;Not because it is clever.&lt;br&gt;
Because it reflects how real systems usually need to operate.&lt;/p&gt;

&lt;p&gt;The strongest setups are usually not the ones with the most agents.&lt;/p&gt;

&lt;p&gt;They are the ones where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the primary agent clearly owns the workflow&lt;/li&gt;
&lt;li&gt;the subagents are genuinely specialized&lt;/li&gt;
&lt;li&gt;the context is controlled&lt;/li&gt;
&lt;li&gt;the outputs are structured&lt;/li&gt;
&lt;li&gt;and the operating model is easy to debug and govern&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is what turns a multi-agent design from an interesting idea into something you can actually run in production.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>enterpriseai</category>
      <category>agenticai</category>
    </item>
    <item>
      <title>A Scaling Lesson Building Production-Grade Agentic AI Systems</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Tue, 19 May 2026 18:30:50 +0000</pubDate>
      <link>https://dev.to/amitkayal/a-scaling-lesson-building-production-grade-agentic-ai-systems-4kgp</link>
      <guid>https://dev.to/amitkayal/a-scaling-lesson-building-production-grade-agentic-ai-systems-4kgp</guid>
      <description>&lt;h1&gt;
  
  
  A Scaling Lesson Building Production-Grade Agentic AI Systems
&lt;/h1&gt;

&lt;p&gt;One of the early observations we had while designing enterprise AI agents was this:&lt;/p&gt;

&lt;p&gt;Giving an agent more tools does not necessarily make it smarter.&lt;/p&gt;

&lt;p&gt;In theory, it sounded correct.&lt;/p&gt;

&lt;p&gt;If an agent had access to customer systems, payment systems, inventory, shipping, reporting, ticketing, email, scheduling, analytics, and internal knowledge bases — it should become more powerful and autonomous.&lt;/p&gt;

&lt;p&gt;But what we observed in real implementations was very different.&lt;/p&gt;

&lt;p&gt;The more tools we added, the more unstable the system became.&lt;/p&gt;

&lt;p&gt;Not because the model was weak.&lt;/p&gt;

&lt;p&gt;Not because the tools were poorly built.&lt;/p&gt;

&lt;p&gt;But because the agent’s decision space became too large.&lt;/p&gt;

&lt;p&gt;For every user request, the agent had to evaluate all available tools, compare descriptions, infer intent, decide sequencing, and determine the best execution path.&lt;/p&gt;

&lt;p&gt;Now imagine doing this with 18 tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer lookup&lt;/li&gt;
&lt;li&gt;Order search&lt;/li&gt;
&lt;li&gt;Refund processing&lt;/li&gt;
&lt;li&gt;Inventory checking&lt;/li&gt;
&lt;li&gt;Shipping tracking&lt;/li&gt;
&lt;li&gt;Email sending&lt;/li&gt;
&lt;li&gt;Ticket creation&lt;/li&gt;
&lt;li&gt;Knowledge base search&lt;/li&gt;
&lt;li&gt;Sentiment analysis&lt;/li&gt;
&lt;li&gt;Language translation&lt;/li&gt;
&lt;li&gt;Calendar scheduling&lt;/li&gt;
&lt;li&gt;Report generation&lt;/li&gt;
&lt;li&gt;Data export&lt;/li&gt;
&lt;li&gt;User authentication&lt;/li&gt;
&lt;li&gt;Payment processing&lt;/li&gt;
&lt;li&gt;Discount application&lt;/li&gt;
&lt;li&gt;Feedback collection&lt;/li&gt;
&lt;li&gt;Escalation routing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Initially, everything looked manageable.&lt;/p&gt;

&lt;p&gt;But as workflows became more dynamic, we started observing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;wrong tool selection,&lt;/li&gt;
&lt;li&gt;unnecessary tool chaining,&lt;/li&gt;
&lt;li&gt;higher latency,&lt;/li&gt;
&lt;li&gt;increased token usage,&lt;/li&gt;
&lt;li&gt;inconsistent execution paths,&lt;/li&gt;
&lt;li&gt;and occasional hallucinated actions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem was not intelligence.&lt;/p&gt;

&lt;p&gt;The problem was cognitive overload inside the orchestration layer.&lt;/p&gt;

&lt;p&gt;Over time, one pattern became very clear:&lt;/p&gt;

&lt;p&gt;Agents perform significantly better when their responsibility boundaries are smaller.&lt;/p&gt;

&lt;p&gt;In our experience, once an agent moves beyond roughly 4–5 actively usable tools, reliability starts dropping rapidly. Similar enterprise orchestration patterns are now recommending smaller, specialized agents instead of monolithic “super agents.”&lt;/p&gt;

&lt;p&gt;That observation changed how we started designing AI systems.&lt;/p&gt;

&lt;p&gt;Instead of building one massive “do everything” agent, we moved toward specialized agents with tightly scoped responsibilities.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;A support agent handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;customer lookup,&lt;/li&gt;
&lt;li&gt;ticket creation,&lt;/li&gt;
&lt;li&gt;escalation routing,&lt;/li&gt;
&lt;li&gt;knowledge retrieval.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A commerce agent handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;orders,&lt;/li&gt;
&lt;li&gt;refunds,&lt;/li&gt;
&lt;li&gt;discounts,&lt;/li&gt;
&lt;li&gt;payments.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An operations agent handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;shipping,&lt;/li&gt;
&lt;li&gt;inventory,&lt;/li&gt;
&lt;li&gt;reporting,&lt;/li&gt;
&lt;li&gt;exports.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This immediately improved:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tool accuracy,&lt;/li&gt;
&lt;li&gt;execution consistency,&lt;/li&gt;
&lt;li&gt;observability,&lt;/li&gt;
&lt;li&gt;debugging,&lt;/li&gt;
&lt;li&gt;latency,&lt;/li&gt;
&lt;li&gt;and operational trust.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But another important learning came later.&lt;/p&gt;

&lt;p&gt;Even after distributing tools properly, systems still degraded when too many agents were active simultaneously.&lt;/p&gt;

&lt;p&gt;This is something many teams underestimate.&lt;/p&gt;

&lt;p&gt;As the number of agents increases, coordination overhead also increases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;more inter-agent communication,&lt;/li&gt;
&lt;li&gt;more memory synchronization,&lt;/li&gt;
&lt;li&gt;more orchestration reasoning,&lt;/li&gt;
&lt;li&gt;more retries,&lt;/li&gt;
&lt;li&gt;more conflict resolution,&lt;/li&gt;
&lt;li&gt;and more state tracking.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At lower scale, this is manageable.&lt;/p&gt;

&lt;p&gt;At enterprise scale, it becomes a serious engineering challenge.&lt;/p&gt;

&lt;p&gt;We observed cases where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;agents started waiting on each other,&lt;/li&gt;
&lt;li&gt;orchestration layers became bottlenecks,&lt;/li&gt;
&lt;li&gt;duplicate reasoning increased token burn,&lt;/li&gt;
&lt;li&gt;cascading retries created operational instability,&lt;/li&gt;
&lt;li&gt;and observability became extremely difficult.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multi-agent systems introduce their own scaling complexity around coordination, governance, and orchestration overhead. Most production-grade architecture guidance today recommends keeping orchestration layers as simple as possible.&lt;/p&gt;

&lt;p&gt;Over time, we established a few practical thumb rules internally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Some Practical Thumb Rules We Follow Now
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Keep Tool Count Small Per Agent
&lt;/h4&gt;

&lt;p&gt;Our practical guideline today is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3–5 tools → ideal&lt;/li&gt;
&lt;li&gt;6–8 tools → manageable with careful prompting&lt;/li&gt;
&lt;li&gt;10+ tools → requires routing/filtering layers&lt;/li&gt;
&lt;li&gt;15+ tools → usually an architectural warning sign&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The issue is not model capability.&lt;/p&gt;

&lt;p&gt;It is decision dilution.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Every Agent Must Have One Clear Business Responsibility
&lt;/h4&gt;

&lt;p&gt;We avoid mixing domains.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;payments + support,&lt;/li&gt;
&lt;li&gt;analytics + execution,&lt;/li&gt;
&lt;li&gt;reporting + approvals,&lt;/li&gt;
&lt;li&gt;inventory + customer engagement.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The narrower the responsibility boundary, the more predictable the behavior.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Start With the Lowest Complexity Possible
&lt;/h4&gt;

&lt;p&gt;One important learning from enterprise orchestration patterns is this:&lt;/p&gt;

&lt;p&gt;Do not introduce multi-agent architecture unless the workflow genuinely requires it.&lt;/p&gt;

&lt;p&gt;Sometimes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a prompt is enough,&lt;/li&gt;
&lt;li&gt;sometimes a single agent is enough,&lt;/li&gt;
&lt;li&gt;sometimes workflows are better handled through deterministic orchestration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not every problem needs “AI teamwork.”&lt;/p&gt;

&lt;h4&gt;
  
  
  4. Avoid Excessive Agent-to-Agent Conversations
&lt;/h4&gt;

&lt;p&gt;Agent collaboration sounds powerful in demos.&lt;/p&gt;

&lt;p&gt;But in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;every interaction increases latency,&lt;/li&gt;
&lt;li&gt;every message consumes tokens,&lt;/li&gt;
&lt;li&gt;every dependency creates failure paths.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We now aggressively reduce unnecessary conversations between agents.&lt;/p&gt;

&lt;h4&gt;
  
  
  5. Retrieval Before Reasoning
&lt;/h4&gt;

&lt;p&gt;Instead of exposing all tools to all agents, we first narrow candidates through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;semantic routing,&lt;/li&gt;
&lt;li&gt;metadata filtering,&lt;/li&gt;
&lt;li&gt;RAG-based retrieval,&lt;/li&gt;
&lt;li&gt;workflow classification.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This significantly improves tool selection accuracy and reduces reasoning load.&lt;/p&gt;

&lt;h4&gt;
  
  
  6. Observability Is Mandatory
&lt;/h4&gt;

&lt;p&gt;Once systems become multi-agent, debugging becomes one of the hardest engineering problems.&lt;/p&gt;

&lt;p&gt;We now treat the following as first-class requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;distributed tracing,&lt;/li&gt;
&lt;li&gt;token tracking,&lt;/li&gt;
&lt;li&gt;step-level logging,&lt;/li&gt;
&lt;li&gt;execution replay,&lt;/li&gt;
&lt;li&gt;agent health monitoring,&lt;/li&gt;
&lt;li&gt;retry visibility,&lt;/li&gt;
&lt;li&gt;and orchestration graphs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without observability, production support becomes nearly impossible.&lt;/p&gt;

&lt;h4&gt;
  
  
  7. Human Escalation Is Still Critical
&lt;/h4&gt;

&lt;p&gt;One thing we intentionally avoid is trying to automate every decision.&lt;/p&gt;

&lt;p&gt;We now introduce human checkpoints for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;financial operations,&lt;/li&gt;
&lt;li&gt;policy-sensitive actions,&lt;/li&gt;
&lt;li&gt;low-confidence reasoning,&lt;/li&gt;
&lt;li&gt;and customer-impacting workflows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Autonomy without governance becomes operational risk.&lt;/p&gt;

&lt;p&gt;What I increasingly believe is that the future of enterprise AI is not one giant super-agent.&lt;/p&gt;

&lt;p&gt;It is orchestrated systems of smaller specialized agents collaborating through routing, delegation, memory sharing, and controlled execution.&lt;/p&gt;

&lt;p&gt;The real engineering challenge is no longer:&lt;br&gt;
“How many tools can an agent use?”&lt;/p&gt;

&lt;p&gt;The better question is:&lt;br&gt;
“How effectively can we reduce the decision burden for each agent while keeping orchestration manageable?”&lt;/p&gt;

&lt;p&gt;That has become one of the most important scaling lessons for us while building production-grade agentic AI systems.&lt;/p&gt;

&lt;h1&gt;
  
  
  How We Are Thinking About This in Cloud Architecture
&lt;/h1&gt;

&lt;p&gt;One important realization for us was that multi-agent systems should not be treated as a single application deployment.&lt;/p&gt;

&lt;p&gt;They should be treated as distributed cloud-native systems.&lt;/p&gt;

&lt;p&gt;That changes the architecture significantly.&lt;/p&gt;

&lt;p&gt;Today, the architecture pattern we increasingly follow looks something like this:&lt;/p&gt;

&lt;h2&gt;
  
  
  Specialized Agents as Independent Services
&lt;/h2&gt;

&lt;p&gt;Each agent runs independently with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;isolated APIs,&lt;/li&gt;
&lt;li&gt;dedicated scaling,&lt;/li&gt;
&lt;li&gt;separate observability,&lt;/li&gt;
&lt;li&gt;isolated memory/context,&lt;/li&gt;
&lt;li&gt;and domain-level permissions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This reduces blast radius and improves operational governance.&lt;/p&gt;

&lt;p&gt;In AWS, this naturally aligns very well with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda,&lt;/li&gt;
&lt;li&gt;ECS/EKS,&lt;/li&gt;
&lt;li&gt;event-driven services,&lt;/li&gt;
&lt;li&gt;queues,&lt;/li&gt;
&lt;li&gt;Bedrock,&lt;/li&gt;
&lt;li&gt;and serverless orchestration patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I personally liked while evaluating newer AWS patterns is how Amazon Bedrock AgentCore is trying to standardize several production concerns around agents. Instead of teams writing custom orchestration glue repeatedly, AgentCore is introducing managed capabilities around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;runtime isolation,&lt;/li&gt;
&lt;li&gt;observability,&lt;/li&gt;
&lt;li&gt;memory,&lt;/li&gt;
&lt;li&gt;identity,&lt;/li&gt;
&lt;li&gt;tool gateways,&lt;/li&gt;
&lt;li&gt;and orchestration patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One thing I strongly relate to from practical experience is this:&lt;/p&gt;

&lt;p&gt;Building the reasoning layer is usually not the hardest part anymore.&lt;/p&gt;

&lt;p&gt;The harder part is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;orchestration,&lt;/li&gt;
&lt;li&gt;debugging,&lt;/li&gt;
&lt;li&gt;tracing,&lt;/li&gt;
&lt;li&gt;retries,&lt;/li&gt;
&lt;li&gt;governance,&lt;/li&gt;
&lt;li&gt;and operational scalability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is where systems usually become unstable at scale.&lt;/p&gt;

&lt;p&gt;AWS AgentCore Observability is also moving in an interesting direction by treating agent execution visibility as a first-class production capability with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;execution tracing,&lt;/li&gt;
&lt;li&gt;token monitoring,&lt;/li&gt;
&lt;li&gt;latency tracking,&lt;/li&gt;
&lt;li&gt;tool usage visibility,&lt;/li&gt;
&lt;li&gt;and CloudWatch integration. ()&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you have multiple agents collaborating dynamically, you need visibility into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why a tool was selected,&lt;/li&gt;
&lt;li&gt;which agent delegated the task,&lt;/li&gt;
&lt;li&gt;what context was shared,&lt;/li&gt;
&lt;li&gt;where retries happened,&lt;/li&gt;
&lt;li&gt;and why execution paths changed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without this, production debugging becomes extremely difficult.&lt;/p&gt;

&lt;p&gt;Another pattern we increasingly prefer is asynchronous orchestration.&lt;/p&gt;

&lt;p&gt;Instead of tightly coupling agents synchronously, we now lean more toward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;queues,&lt;/li&gt;
&lt;li&gt;events,&lt;/li&gt;
&lt;li&gt;workflow engines,&lt;/li&gt;
&lt;li&gt;and loosely coupled communication.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It improves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;resilience,&lt;/li&gt;
&lt;li&gt;scalability,&lt;/li&gt;
&lt;li&gt;retry handling,&lt;/li&gt;
&lt;li&gt;and fault isolation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most importantly, it prevents one overloaded agent from slowing down the entire system.&lt;/p&gt;

&lt;p&gt;What I increasingly believe is that the future of enterprise AI is not one giant super-agent.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>aws</category>
      <category>agentcore</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Technical debt handling</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 11 May 2026 13:15:12 +0000</pubDate>
      <link>https://dev.to/amitkayal/technical-debt-handling-38on</link>
      <guid>https://dev.to/amitkayal/technical-debt-handling-38on</guid>
      <description>&lt;p&gt;Over the years, my opinion on technical debt has changed a lot. Earlier, I used to think technical debt meant bad engineering decisions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Now I think differently&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In product companies, especially fast-moving SaaS and AI products, some level of technical debt is unavoidable. If teams try to make everything perfect from day one, they usually move too slowly.&lt;br&gt;
The real problem is not technical debt.&lt;br&gt;
The real problem is when nobody knows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why the shortcut was taken&lt;/li&gt;
&lt;li&gt;how long it can survive&lt;/li&gt;
&lt;li&gt;what impact it will create later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Personally, I look at technical debt in 3 broad categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strategic debt : Shortcuts taken consciously to move faster, validate ideas, or release quickly.&lt;/li&gt;
&lt;li&gt;Operational debt: Things that slowly start hurting deployments, production stability, debugging, support effort, and developer productivity.&lt;/li&gt;
&lt;li&gt;Architectural debt: This is the one that becomes dangerous over time. Scaling becomes harder, integrations become messy, releases become slower, and every new feature starts feeling more expensive to build.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I feel AI products make this even more complicated. In normal SaaS systems, debt usually impacts engineering speed. But in AI systems, technical debt can directly affect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;response quality&lt;/li&gt;
&lt;li&gt;hallucination handling&lt;/li&gt;
&lt;li&gt;latency&lt;/li&gt;
&lt;li&gt;observability&lt;/li&gt;
&lt;li&gt;model cost&lt;/li&gt;
&lt;li&gt;evaluation consistency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And because AI systems are probabilistic, debugging becomes much harder compared to traditional software.&lt;/p&gt;

&lt;p&gt;I’ve also seen SaaS platforms suffer heavily from invisible debt because of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multi-tenant complexity&lt;/li&gt;
&lt;li&gt;customer-specific customizations&lt;/li&gt;
&lt;li&gt;integrations&lt;/li&gt;
&lt;li&gt;deployment dependencies&lt;/li&gt;
&lt;li&gt;security and compliance requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One weak architectural decision early on can create pain for years.&lt;/p&gt;

&lt;p&gt;That’s why I personally prefer making technical debt visible and measurable instead of treating it as a future problem.&lt;/p&gt;

&lt;p&gt;Some of the signals I usually watch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deployment friction&lt;/li&gt;
&lt;li&gt;rollback frequency&lt;/li&gt;
&lt;li&gt;incident trends&lt;/li&gt;
&lt;li&gt;onboarding difficulty for new engineers&lt;/li&gt;
&lt;li&gt;release confidence&lt;/li&gt;
&lt;li&gt;overall engineering velocity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One pattern I’ve noticed repeatedly:&lt;br&gt;
When team size keeps increasing but delivery speed keeps dropping, technical debt is already affecting the organization.&lt;/p&gt;

</description>
      <category>design</category>
      <category>ai</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Learnings while working with long-running AI agents</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 11 May 2026 13:12:53 +0000</pubDate>
      <link>https://dev.to/amitkayal/learnings-while-working-with-long-running-ai-agents-pi9</link>
      <guid>https://dev.to/amitkayal/learnings-while-working-with-long-running-ai-agents-pi9</guid>
      <description>&lt;p&gt;One of my biggest learnings while working with long-running AI agents is that logging and progress reporting are not optional features when the agent is tightly coupled with a UI — they are part of the product experience itself.&lt;/p&gt;

&lt;p&gt;Initially, I used to think of logging mainly from a debugging or engineering perspective. But with agentic systems, especially long-running workflows involving multiple tools, reasoning steps, APIs, retries, or multi-agent coordination, I realized users experience “silence” very differently than traditional applications.&lt;br&gt;
When an agent takes 30 seconds, 2 minutes, or longer without visible progress, users immediately start questioning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the system stuck?&lt;/li&gt;
&lt;li&gt;Did my request fail?&lt;/li&gt;
&lt;li&gt;Is it doing the wrong thing?&lt;/li&gt;
&lt;li&gt;Should I refresh or retry?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That uncertainty destroys trust very quickly.&lt;br&gt;
I learned that users do not just want the final answer — they want confidence that the system is actively working toward the answer. Progress visibility creates psychological assurance. Even simple updates like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Analyzing uploaded documents…”&lt;/li&gt;
&lt;li&gt;“Fetching data from CRM…”&lt;/li&gt;
&lt;li&gt;“Generating recommendations…”&lt;/li&gt;
&lt;li&gt;“Validating final response…”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;dramatically improve user confidence and patience.&lt;br&gt;
Another major realization was that long-running agents are fundamentally non-deterministic systems. Unlike traditional APIs, agents can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;take different execution paths,&lt;/li&gt;
&lt;li&gt;loop through reasoning,&lt;/li&gt;
&lt;li&gt;invoke tools dynamically,&lt;/li&gt;
&lt;li&gt;retry failed steps,&lt;/li&gt;
&lt;li&gt;or spend time resolving ambiguity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without structured logging and traceability, debugging becomes extremely difficult because the same input may not always produce the same internal execution path. Modern AI observability emphasize tracing tool calls, reasoning paths, latency, token usage, and execution flow because agent behavior is inherently complex and probabilistic. &lt;/p&gt;

&lt;p&gt;I also learned that progress reporting is not only for users — it becomes equally important for engineering and operational visibility. Once agents move into production, observability helps teams identify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;where workflows slow down,&lt;/li&gt;
&lt;li&gt;which tool calls fail,&lt;/li&gt;
&lt;li&gt;why latency spikes happen,&lt;/li&gt;
&lt;li&gt;and where hallucinations or execution deviations originate. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One practical lesson I learned is that UI-integrated agents should expose execution state intentionally, not dump raw logs. There is a difference between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;engineering telemetry,&lt;/li&gt;
&lt;li&gt;operational traces,&lt;/li&gt;
&lt;li&gt;and user-friendly progress communication.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Users need understandable milestones, while engineers need deep execution traces.&lt;br&gt;
Another important learning was around perceived performance. In many cases, improving progress visibility improved user satisfaction more than reducing actual latency. A 90-second process with clear step-by-step reporting often feels faster and more reliable than a silent 40-second execution.&lt;/p&gt;

&lt;p&gt;Today, I strongly believe that for long-running AI agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;logging is part of reliability,&lt;/li&gt;
&lt;li&gt;progress reporting is part of UX,&lt;/li&gt;
&lt;li&gt;and observability is part of trust.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>genai</category>
      <category>agents</category>
      <category>aws</category>
    </item>
    <item>
      <title>Building a Hybrid AWS Microservices Platform with API Gateway, Lambda, ECS, and Load Balancers</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 20 Apr 2026 18:41:39 +0000</pubDate>
      <link>https://dev.to/amitkayal/building-a-hybrid-aws-microservices-platform-with-api-gateway-lambda-ecs-and-load-balancers-mnn</link>
      <guid>https://dev.to/amitkayal/building-a-hybrid-aws-microservices-platform-with-api-gateway-lambda-ecs-and-load-balancers-mnn</guid>
      <description>&lt;h1&gt;
  
  
  Building a Hybrid AWS Microservices Platform with API Gateway, Lambda, ECS, and Load Balancers
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When teams start splitting a large backend into smaller services, the first infrastructure question is usually not "How do we build a microservice?" but "How do we expose many different services safely, consistently, and without creating a networking mess?"&lt;/p&gt;

&lt;p&gt;Our architecture provides a practical answer to that problem using a hybrid AWS design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API Gateway as the front door&lt;/li&gt;
&lt;li&gt;Lambda for lightweight serverless capabilities and supporting workflows&lt;/li&gt;
&lt;li&gt;ECS Fargate for containerized business services&lt;/li&gt;
&lt;li&gt;Internal load balancers for private service routing&lt;/li&gt;
&lt;li&gt;Terraform for repeatable, staged infrastructure delivery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important architectural idea is separation of concerns. Public access, authentication, routing, container execution, and service discovery are all handled by different layers. That keeps the platform easier to scale and much easier to evolve as the number of services grows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Pattern
&lt;/h2&gt;

&lt;p&gt;At a high level, the platform follows this flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A client sends an HTTPS request to API Gateway.&lt;/li&gt;
&lt;li&gt;API Gateway applies request-level controls such as API key enforcement, CORS behavior, and route matching.&lt;/li&gt;
&lt;li&gt;The request is sent either to a Lambda-backed endpoint or to a private containerized service.&lt;/li&gt;
&lt;li&gt;For ECS services, traffic goes through a VPC Link into internal load balancing.&lt;/li&gt;
&lt;li&gt;The load balancer forwards the request to the correct ECS service based on path rules.&lt;/li&gt;
&lt;li&gt;ECS Fargate runs one or more healthy tasks for that service and returns the response.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This gives a single API surface to consumers while allowing the backend implementation to vary by use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Combine Lambda and ECS?
&lt;/h2&gt;

&lt;p&gt;A platform like this benefits from using both compute models rather than forcing every workload into one.&lt;/p&gt;

&lt;p&gt;Lambda is a strong fit for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lightweight request handlers&lt;/li&gt;
&lt;li&gt;event-driven tasks&lt;/li&gt;
&lt;li&gt;simple orchestration&lt;/li&gt;
&lt;li&gt;platform support functions&lt;/li&gt;
&lt;li&gt;endpoints that do not need a full container lifecycle&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ECS Fargate is a better fit for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;long-lived HTTP microservices&lt;/li&gt;
&lt;li&gt;containerized frameworks and dependencies&lt;/li&gt;
&lt;li&gt;services that need more predictable runtime behavior&lt;/li&gt;
&lt;li&gt;APIs that benefit from load balancing, health checks, and horizontal scaling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In our architecture, the design supports both. Some APIs are routed to Lambda-based services, while others are routed to ECS services defined through service configuration. That hybrid model is useful in real organizations because all services do not have the same runtime needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Three-Stage Infrastructure Model
&lt;/h2&gt;

&lt;p&gt;One of the strongest ideas in our architecture is the staged Terraform layout. Instead of deploying everything together, the infrastructure is split into three layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 1: Networking
&lt;/h3&gt;

&lt;p&gt;The first stage establishes the network foundation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VPC selection or creation&lt;/li&gt;
&lt;li&gt;public and private subnet discovery or provisioning&lt;/li&gt;
&lt;li&gt;internal Network Load Balancer&lt;/li&gt;
&lt;li&gt;internal Application Load Balancer&lt;/li&gt;
&lt;li&gt;VPC Link for API Gateway&lt;/li&gt;
&lt;li&gt;ECS task security group&lt;/li&gt;
&lt;li&gt;ALB log storage and network observability components&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This stage is intentionally infrastructure-only. No application services are deployed here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 2: Compute
&lt;/h3&gt;

&lt;p&gt;The second stage provisions the actual execution environment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ECS cluster on Fargate&lt;/li&gt;
&lt;li&gt;ECR repositories for service images&lt;/li&gt;
&lt;li&gt;target groups per service&lt;/li&gt;
&lt;li&gt;ALB listener and listener rules&lt;/li&gt;
&lt;li&gt;ECS service definitions&lt;/li&gt;
&lt;li&gt;CloudWatch log groups&lt;/li&gt;
&lt;li&gt;Lambda functions used by the platform&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This stage consumes outputs from the networking stage so the compute layer never hardcodes network assumptions in its own design.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 3: API Gateways
&lt;/h3&gt;

&lt;p&gt;The third stage exposes services through API Gateway:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a public API for internet-facing consumption&lt;/li&gt;
&lt;li&gt;a private API for VPC-only access&lt;/li&gt;
&lt;li&gt;route creation from service metadata&lt;/li&gt;
&lt;li&gt;VPC Link integrations for containerized services&lt;/li&gt;
&lt;li&gt;Lambda proxy integrations for Lambda-backed services&lt;/li&gt;
&lt;li&gt;API keys, usage plans, and stage configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This split is operationally important. Teams can change routing without rebuilding networking, and they can add services without redesigning the entire platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Request Path for ECS Services
&lt;/h2&gt;

&lt;p&gt;For containerized microservices, the implementation follows a private ingress model.&lt;/p&gt;

&lt;p&gt;The path is:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Client -&amp;gt; API Gateway -&amp;gt; VPC Link -&amp;gt; internal NLB -&amp;gt; internal ALB -&amp;gt; ECS service -&amp;gt; ECS task&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That may look like one hop too many at first, but each layer has a purpose.&lt;/p&gt;

&lt;h3&gt;
  
  
  API Gateway
&lt;/h3&gt;

&lt;p&gt;API Gateway is the public control plane. It handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TLS termination at the edge&lt;/li&gt;
&lt;li&gt;route exposure&lt;/li&gt;
&lt;li&gt;API key enforcement&lt;/li&gt;
&lt;li&gt;request and header mapping&lt;/li&gt;
&lt;li&gt;CORS handling&lt;/li&gt;
&lt;li&gt;stage-based deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It gives consumers a stable API contract while keeping the backend private.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why a VPC Link Is Used
&lt;/h3&gt;

&lt;p&gt;ECS services are not exposed directly to the internet. Instead, API Gateway connects privately into the VPC using a VPC Link. That allows the public API layer to reach internal services without making the services themselves public.&lt;/p&gt;

&lt;p&gt;This is a strong security pattern because the application runtime stays inside the VPC, but consumers still get a clean managed API endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why the Repository Uses Both NLB and ALB
&lt;/h3&gt;

&lt;p&gt;A useful implementation detail in our architecture is that the VPC Link targets an internal Network Load Balancer, and that NLB forwards to an internal Application Load Balancer.&lt;/p&gt;

&lt;p&gt;This arrangement provides two separate benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The NLB is used as the stable target for the API Gateway VPC Link.&lt;/li&gt;
&lt;li&gt;The ALB performs path-based routing to the actual microservices.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ALB is what makes many ECS services practical behind one internal entry point. Each service gets its own listener rule and target group, so the platform can route based on URL path rather than provisioning a separate load balancer per service.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Load Balancing Works
&lt;/h2&gt;

&lt;p&gt;The load-balancing model is service-oriented.&lt;/p&gt;

&lt;p&gt;Each ECS microservice contributes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a base API path&lt;/li&gt;
&lt;li&gt;an ALB path pattern&lt;/li&gt;
&lt;li&gt;a listener rule priority&lt;/li&gt;
&lt;li&gt;a container port&lt;/li&gt;
&lt;li&gt;a health check definition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From that metadata, Terraform creates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one target group per service&lt;/li&gt;
&lt;li&gt;one listener rule per service&lt;/li&gt;
&lt;li&gt;one ECS service per service&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means the routing layer is not manually duplicated for every new microservice. The service declares its path and runtime settings, and the platform generates the infrastructure around it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Target Groups
&lt;/h3&gt;

&lt;p&gt;Each target group points to ECS tasks using IP targets. That is the correct choice for Fargate because tasks run with their own elastic networking interfaces rather than on shared EC2 hosts.&lt;/p&gt;

&lt;p&gt;The target groups in this repository also use application-level health checks. A task is considered healthy only when its service endpoint responds successfully on the configured health path.&lt;/p&gt;

&lt;p&gt;That matters because container startup is not the same as application readiness. A service may be running from ECS's perspective but still not ready to receive traffic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Listener Rules
&lt;/h3&gt;

&lt;p&gt;The ALB listener is configured once, and each service gets a path-based rule. For example, a service under a quoting path can be matched independently from a service under a product-pricing path.&lt;/p&gt;

&lt;p&gt;This keeps the routing layer centralized and avoids deploying a dedicated ALB per service, which would become expensive and operationally noisy as the platform grows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Health Checks and Traffic Protection
&lt;/h3&gt;

&lt;p&gt;The repository uses health checks in multiple places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API health endpoints at the application level&lt;/li&gt;
&lt;li&gt;ALB target group health checks&lt;/li&gt;
&lt;li&gt;ECS service health grace periods&lt;/li&gt;
&lt;li&gt;container health checks inside the task definition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That layered approach improves resilience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unhealthy tasks are removed from target groups&lt;/li&gt;
&lt;li&gt;ECS replaces failed tasks&lt;/li&gt;
&lt;li&gt;API Gateway continues to route through the same private entry point&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a platform that can recover from instance-level failures without changing the public API contract.&lt;/p&gt;

&lt;h2&gt;
  
  
  How ECS Is Structured
&lt;/h2&gt;

&lt;p&gt;The ECS side of the platform is built for repeatability rather than one-off service definitions.&lt;/p&gt;

&lt;h3&gt;
  
  
  ECS Cluster
&lt;/h3&gt;

&lt;p&gt;The platform provisions a shared ECS cluster per environment. That allows multiple microservices to run within the same operational boundary while still being isolated at the task and service level.&lt;/p&gt;

&lt;p&gt;The cluster uses Fargate, which removes the need to manage EC2 worker nodes. This simplifies operations significantly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no patching of container hosts&lt;/li&gt;
&lt;li&gt;no cluster capacity management at the instance level&lt;/li&gt;
&lt;li&gt;easier scaling by task count&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reusable ECS Service Module
&lt;/h3&gt;

&lt;p&gt;Instead of defining each ECS service from scratch, the repository uses a reusable Terraform module for service deployment.&lt;/p&gt;

&lt;p&gt;That module is responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;task definition creation&lt;/li&gt;
&lt;li&gt;container logging configuration&lt;/li&gt;
&lt;li&gt;IAM role wiring&lt;/li&gt;
&lt;li&gt;ECS service creation&lt;/li&gt;
&lt;li&gt;target group attachment&lt;/li&gt;
&lt;li&gt;subnet and security group placement&lt;/li&gt;
&lt;li&gt;optional capacity provider strategy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a strong platform choice. It makes service onboarding consistent and reduces drift between services.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task Definitions
&lt;/h3&gt;

&lt;p&gt;Each service runs as a Fargate task with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a named container image from ECR&lt;/li&gt;
&lt;li&gt;CPU and memory settings&lt;/li&gt;
&lt;li&gt;environment variables&lt;/li&gt;
&lt;li&gt;a health check command&lt;/li&gt;
&lt;li&gt;CloudWatch logging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repository also includes support for an additional X-Ray sidecar container in the task definition pattern, which is useful for distributed tracing in a microservice environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Network Mode
&lt;/h3&gt;

&lt;p&gt;Tasks run with &lt;code&gt;awsvpc&lt;/code&gt; networking, which gives each task its own network interface and private IP. This is the standard model for ECS on Fargate and is what allows ALB target groups to use IP mode cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Subnet and Security Group Design
&lt;/h2&gt;

&lt;p&gt;This repository supports both existing/default VPC usage and a more segmented custom VPC model.&lt;/p&gt;

&lt;p&gt;That flexibility matters because many teams start in a default-VPC or dev-friendly setup and later move to stricter network isolation for staging and production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subnet Placement
&lt;/h3&gt;

&lt;p&gt;The network layer discovers public and private subnets where available. In a custom VPC, the design supports proper private subnet deployment. In a simpler default VPC setup, the platform can fall back to available public subnets when private ones are not present.&lt;/p&gt;

&lt;p&gt;This is an important operational nuance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;development environments often optimize for simplicity&lt;/li&gt;
&lt;li&gt;higher environments usually optimize for stricter isolation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repository is built to handle both.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Groups
&lt;/h3&gt;

&lt;p&gt;The security model follows least-privilege intent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ECS tasks accept application traffic from the internal load-balancing layer&lt;/li&gt;
&lt;li&gt;services are not directly internet-facing&lt;/li&gt;
&lt;li&gt;API Gateway reaches backend services through private network integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This keeps the application tier out of direct public exposure while still allowing a public API facade.&lt;/p&gt;

&lt;h2&gt;
  
  
  Config-Driven Service Onboarding
&lt;/h2&gt;

&lt;p&gt;One of the most scalable ideas in our architecture is that services are registered through configuration rather than by handcrafting infrastructure every time.&lt;/p&gt;

&lt;p&gt;There is a master service registry that lists enabled services per environment, and each service provides its own deployment metadata, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;service identity&lt;/li&gt;
&lt;li&gt;container port&lt;/li&gt;
&lt;li&gt;desired task count&lt;/li&gt;
&lt;li&gt;CPU and memory&lt;/li&gt;
&lt;li&gt;API base path&lt;/li&gt;
&lt;li&gt;ALB path pattern&lt;/li&gt;
&lt;li&gt;listener priority&lt;/li&gt;
&lt;li&gt;health check behavior&lt;/li&gt;
&lt;li&gt;logging retention&lt;/li&gt;
&lt;li&gt;autoscaling preferences&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This creates a platform model rather than a collection of unrelated microservices.&lt;/p&gt;

&lt;p&gt;Adding a new service becomes a repeatable process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create the service.&lt;/li&gt;
&lt;li&gt;Define its configuration.&lt;/li&gt;
&lt;li&gt;Register it in the service catalog.&lt;/li&gt;
&lt;li&gt;Build and publish the image.&lt;/li&gt;
&lt;li&gt;Apply Terraform stages.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is much easier to maintain than cloning infrastructure blocks over and over.&lt;/p&gt;

&lt;h2&gt;
  
  
  Container Delivery with ECR
&lt;/h2&gt;

&lt;p&gt;For ECS workloads, the container supply chain is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build the service image.&lt;/li&gt;
&lt;li&gt;Push it to an ECR repository.&lt;/li&gt;
&lt;li&gt;Reference the tagged image in the ECS task definition.&lt;/li&gt;
&lt;li&gt;Update the ECS service to roll out the new task definition.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Our platform provisions one ECR repository per service, with image scanning enabled. That is a good baseline for a microservices platform because it keeps artifacts separated by service while still following a common naming convention.&lt;/p&gt;

&lt;p&gt;There is also an explicit deployment phase between infrastructure provisioning and API exposure where container images are built and pushed. That is a practical real-world step many diagrams omit, but it is essential because ECS cannot run a service until the image exists in the registry.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Lambda Fits into the Platform
&lt;/h2&gt;

&lt;p&gt;Lambda is used here as a first-class platform option, not as an afterthought.&lt;/p&gt;

&lt;p&gt;There are two useful Lambda patterns in our architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Lambda as an API Backend
&lt;/h3&gt;

&lt;p&gt;Some services can be exposed through API Gateway using Lambda proxy integration. This is ideal for capabilities that are naturally event-driven, lightweight, or operationally simpler as functions than as always-on containers.&lt;/p&gt;

&lt;p&gt;In this model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API Gateway owns the route&lt;/li&gt;
&lt;li&gt;Lambda executes the business logic&lt;/li&gt;
&lt;li&gt;API Gateway returns the Lambda response directly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This avoids unnecessary load-balancer and container overhead for smaller workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Lambda as a Platform Support Function
&lt;/h3&gt;

&lt;p&gt;Our architecture also provisions Lambda functions that support the overall platform, such as authentication-related or onboarding-related workflows.&lt;/p&gt;

&lt;p&gt;This is a smart use of Lambda in a hybrid platform because not every supporting concern needs to run inside ECS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Authentication and API Protection
&lt;/h2&gt;

&lt;p&gt;Our architecture clearly treats API protection as an API Gateway concern.&lt;/p&gt;

&lt;p&gt;The current public API implementation enforces API key usage through API Gateway methods, API keys, and usage plans. The codebase also provisions a supporting API key validation Lambda function and related permissions, which shows the platform is designed to accommodate Lambda-based validation flows where needed.&lt;/p&gt;

&lt;p&gt;From a blog perspective, the important architectural takeaway is this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keep authentication and traffic governance at the gateway layer&lt;/li&gt;
&lt;li&gt;keep service containers focused on business logic&lt;/li&gt;
&lt;li&gt;keep private workloads private&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That separation keeps the platform easier to secure and easier to reason about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Public and Private API Models
&lt;/h2&gt;

&lt;p&gt;Another strength of our architecture is that it supports both public and private APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Public API
&lt;/h3&gt;

&lt;p&gt;The public API is intended for internet-facing access. It handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;external client access&lt;/li&gt;
&lt;li&gt;API keys and usage plans&lt;/li&gt;
&lt;li&gt;CORS behavior&lt;/li&gt;
&lt;li&gt;Lambda and ECS route exposure&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Private API
&lt;/h3&gt;

&lt;p&gt;The private API is intended for internal or VPC-scoped access. It is useful when services should only be reachable from trusted network boundaries such as internal AWS workloads, integration environments, or enterprise connectivity paths.&lt;/p&gt;

&lt;p&gt;This split is helpful when some capabilities should be public and others should remain internal even though they share the same service platform underneath.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability and Operations
&lt;/h2&gt;

&lt;p&gt;A microservices platform is only as good as its operational visibility.&lt;/p&gt;

&lt;p&gt;Our architecture includes observability at several levels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CloudWatch log groups for ECS services&lt;/li&gt;
&lt;li&gt;CloudWatch logs for Lambda functions&lt;/li&gt;
&lt;li&gt;API Gateway stage logging&lt;/li&gt;
&lt;li&gt;ALB logging support&lt;/li&gt;
&lt;li&gt;VPC flow logging&lt;/li&gt;
&lt;li&gt;X-Ray-friendly task patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination helps answer the most common production questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did the request reach the gateway?&lt;/li&gt;
&lt;li&gt;Was it routed to the right backend?&lt;/li&gt;
&lt;li&gt;Was the target healthy?&lt;/li&gt;
&lt;li&gt;Did the service fail or time out?&lt;/li&gt;
&lt;li&gt;Was the problem in networking, routing, or application logic?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that layered visibility, hybrid platforms become difficult to troubleshoot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling Characteristics
&lt;/h2&gt;

&lt;p&gt;This architecture scales well because each layer can evolve somewhat independently.&lt;/p&gt;

&lt;h3&gt;
  
  
  API Layer Scaling
&lt;/h3&gt;

&lt;p&gt;API Gateway absorbs public traffic without requiring the backend to manage edge-facing concerns directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  ECS Scaling
&lt;/h3&gt;

&lt;p&gt;ECS services scale by task count. Each service can define:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;desired count&lt;/li&gt;
&lt;li&gt;minimum and maximum capacity&lt;/li&gt;
&lt;li&gt;CPU and memory sizing&lt;/li&gt;
&lt;li&gt;autoscaling thresholds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means heavily used services can scale out without affecting lighter services.&lt;/p&gt;

&lt;h3&gt;
  
  
  Platform Growth
&lt;/h3&gt;

&lt;p&gt;As more services are added, the platform does not need a new ingress pattern each time. The same path-based routing model continues to work as long as route definitions and listener priorities stay clean.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alignment with AWS Well-Architected Best Practices
&lt;/h2&gt;

&lt;p&gt;This architecture also aligns well with AWS best-practice design principles, especially the AWS Well-Architected mindset.&lt;/p&gt;

&lt;h3&gt;
  
  
  Operational Excellence
&lt;/h3&gt;

&lt;p&gt;We have structured the platform so that it is operated as a system rather than as a collection of one-off deployments.&lt;/p&gt;

&lt;p&gt;This is reflected in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;staged Terraform deployments for clearer ownership and safer changes&lt;/li&gt;
&lt;li&gt;configuration-driven service onboarding&lt;/li&gt;
&lt;li&gt;consistent ECS service patterns through reusable modules&lt;/li&gt;
&lt;li&gt;standardized logging and deployment workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This reduces manual drift and makes operational changes more repeatable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security
&lt;/h3&gt;

&lt;p&gt;Security is addressed through layered controls rather than a single protection point.&lt;/p&gt;

&lt;p&gt;We have adhered to good AWS security practices by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;placing ECS services behind private networking rather than exposing them directly&lt;/li&gt;
&lt;li&gt;using API Gateway as the controlled ingress layer&lt;/li&gt;
&lt;li&gt;applying API-level protection at the gateway&lt;/li&gt;
&lt;li&gt;using security groups to limit east-west traffic&lt;/li&gt;
&lt;li&gt;supporting encrypted log and storage patterns&lt;/li&gt;
&lt;li&gt;separating public access from internal service routing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This follows the AWS principle of strong boundaries, least privilege, and defense in depth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reliability
&lt;/h3&gt;

&lt;p&gt;Reliability comes from designing for failure at the service and routing layers.&lt;/p&gt;

&lt;p&gt;We have incorporated that through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multi-AZ subnet placement&lt;/li&gt;
&lt;li&gt;load balancer health checks&lt;/li&gt;
&lt;li&gt;ECS task replacement behavior&lt;/li&gt;
&lt;li&gt;target group isolation per service&lt;/li&gt;
&lt;li&gt;decoupled gateway and backend layers&lt;/li&gt;
&lt;li&gt;staged infrastructure dependencies with clear outputs between layers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means a failing task or unhealthy target does not require the API surface itself to change.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Efficiency
&lt;/h3&gt;

&lt;p&gt;The architecture chooses the right compute model for the right workload.&lt;/p&gt;

&lt;p&gt;That is an AWS best practice because it avoids treating all traffic the same.&lt;/p&gt;

&lt;p&gt;Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda for lighter, event-oriented, or supporting workflows&lt;/li&gt;
&lt;li&gt;ECS Fargate for containerized services that need steady HTTP handling&lt;/li&gt;
&lt;li&gt;ALB path-based routing for efficient multi-service consolidation&lt;/li&gt;
&lt;li&gt;service-specific CPU, memory, and scaling settings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This lets us tune services independently instead of overprovisioning everything at the platform level.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost Optimization
&lt;/h3&gt;

&lt;p&gt;Cost optimization is also visible in the design choices.&lt;/p&gt;

&lt;p&gt;We are not multiplying infrastructure unnecessarily. Instead, the architecture encourages shared but controlled platform components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one API layer for many services&lt;/li&gt;
&lt;li&gt;one internal routing layer for many ECS workloads&lt;/li&gt;
&lt;li&gt;shared ECS cluster patterns per environment&lt;/li&gt;
&lt;li&gt;service-level scaling instead of blanket scaling&lt;/li&gt;
&lt;li&gt;support for Fargate and optional capacity-provider strategies where appropriate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is much closer to AWS best practice than provisioning separate ingress and compute stacks for every small service.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sustainability and Maintainability
&lt;/h3&gt;

&lt;p&gt;Even when sustainability is not called out directly, maintainable designs usually consume fewer engineering and infrastructure resources over time.&lt;/p&gt;

&lt;p&gt;The architecture helps here by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reducing duplicated infrastructure definitions&lt;/li&gt;
&lt;li&gt;making service onboarding metadata-driven&lt;/li&gt;
&lt;li&gt;encouraging reuse of shared platform components&lt;/li&gt;
&lt;li&gt;keeping the public contract stable while backend services evolve&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That leads to lower long-term complexity, which is a practical form of architectural efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Pattern Works Well
&lt;/h2&gt;

&lt;p&gt;This AWS pattern is effective because it balances standardization with flexibility.&lt;/p&gt;

&lt;p&gt;It standardizes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deployment stages&lt;/li&gt;
&lt;li&gt;ingress architecture&lt;/li&gt;
&lt;li&gt;service registration&lt;/li&gt;
&lt;li&gt;load-balancer behavior&lt;/li&gt;
&lt;li&gt;logging and health checks&lt;/li&gt;
&lt;li&gt;ECS service creation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It stays flexible by allowing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda-backed endpoints&lt;/li&gt;
&lt;li&gt;ECS-backed endpoints&lt;/li&gt;
&lt;li&gt;public and private APIs&lt;/li&gt;
&lt;li&gt;different service-level scaling and runtime settings&lt;/li&gt;
&lt;li&gt;multiple environments with different networking strategies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly what a growing microservices platform needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Implementation Advice
&lt;/h2&gt;

&lt;p&gt;If you want to implement a similar architecture, a good sequence is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build the networking foundation first.&lt;/li&gt;
&lt;li&gt;Keep all service backends private.&lt;/li&gt;
&lt;li&gt;Put API Gateway in front of everything external.&lt;/li&gt;
&lt;li&gt;Use ECS Fargate for containerized APIs that benefit from long-lived service behavior.&lt;/li&gt;
&lt;li&gt;Use Lambda for support functions and lightweight endpoints.&lt;/li&gt;
&lt;li&gt;Register services through metadata, not repetitive infrastructure definitions.&lt;/li&gt;
&lt;li&gt;Use path-based ALB routing so many services can share one internal ingress layer.&lt;/li&gt;
&lt;li&gt;Add strong health checks and centralized logs before traffic grows.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key is not just choosing AWS services, but assigning each AWS service a clear responsibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Our architecture demonstrates a mature way to implement Lambda and ECS-based microservices through API Gateway without exposing backend services directly.&lt;/p&gt;

&lt;p&gt;The architecture uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;staged Terraform for separation of concerns&lt;/li&gt;
&lt;li&gt;API Gateway as the public and private API facade&lt;/li&gt;
&lt;li&gt;Lambda where serverless execution makes sense&lt;/li&gt;
&lt;li&gt;ECS Fargate for containerized microservices&lt;/li&gt;
&lt;li&gt;NLB and ALB together for private, path-aware routing&lt;/li&gt;
&lt;li&gt;config-driven onboarding for scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For teams building an enterprise microservices platform, this is a strong pattern because it supports security, operational clarity, and service growth without forcing every workload into the same runtime model.&lt;/p&gt;

&lt;p&gt;Most importantly, it turns infrastructure into a reusable platform. Once that platform is in place, adding the next service becomes much easier than adding the first one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lessons Learned
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Keeping API Gateway as the front door and backend services private makes the architecture easier to secure and easier to evolve.&lt;/li&gt;
&lt;li&gt;Using both Lambda and ECS is more practical than forcing every use case into a single compute model.&lt;/li&gt;
&lt;li&gt;Path-based routing through shared internal load balancing scales better than creating isolated ingress infrastructure for every service.&lt;/li&gt;
&lt;li&gt;Service onboarding becomes significantly easier when routing, health checks, scaling, and runtime settings are driven by configuration.&lt;/li&gt;
&lt;li&gt;Health checks, logging, and observability need to be designed from the beginning; adding them later is much harder in a distributed system.&lt;/li&gt;
&lt;li&gt;A staged infrastructure model reduces operational risk because networking, compute, and API exposure can be changed independently.&lt;/li&gt;
&lt;li&gt;Standardizing platform patterns early saves substantial effort as the number of microservices grows.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>architecture</category>
      <category>aws</category>
      <category>lambda</category>
      <category>apigateway</category>
    </item>
    <item>
      <title>Building a Practical Lambda Capacity Provider Platform: Lessons Learned from Warm Pools, Version Hygiene, and CI/CD Reality</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 20 Apr 2026 18:25:06 +0000</pubDate>
      <link>https://dev.to/amitkayal/building-a-practical-lambda-capacity-provider-platform-lessons-learned-from-warm-pools-version-1l7j</link>
      <guid>https://dev.to/amitkayal/building-a-practical-lambda-capacity-provider-platform-lessons-learned-from-warm-pools-version-1l7j</guid>
      <description>&lt;h1&gt;
  
  
  Building a Practical Lambda Capacity Provider Platform: Lessons Learned from Warm Pools, Version Hygiene, and CI/CD Reality
&lt;/h1&gt;

&lt;p&gt;There is a big difference between a slide-deck architecture and an operating system you can trust on a Monday morning.&lt;/p&gt;

&lt;p&gt;This implementation captures that difference well. On paper, the idea is simple: create a shared AWS Lambda Managed Instances capacity provider, run latency-sensitive workloads on ARM64, keep the pool warm with EventBridge, prune old Lambda versions before they become operational debt, and wrap the whole thing in a GitHub Actions plus CodeBuild delivery model. In practice, each of those choices changes how you think about performance, cost, blast radius, and developer discipline.&lt;/p&gt;

&lt;p&gt;What follows is not a generic cloud post. It is the kind of write-up you produce after actually building and living with the system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Problem We Were Solving
&lt;/h2&gt;

&lt;p&gt;Traditional Lambda is excellent when you want abstraction and convenience. It becomes less elegant when your workload is sensitive to startup time, carries heavier dependencies, or needs more predictable execution behavior under bursty load.&lt;/p&gt;

&lt;p&gt;That is where a Lambda capacity provider changes the discussion.&lt;/p&gt;

&lt;p&gt;In this implementation, the platform is built around a shared &lt;code&gt;aws_lambda_capacity_provider&lt;/code&gt; that uses ARM64 Graviton instances and auto scaling. The core idea is straightforward: instead of leaving execution placement entirely to the default Lambda fleet, we deliberately provide a managed compute pool that multiple functions can share. That gives us more control over cost-performance characteristics and lets us design around cold-start pain rather than merely complain about it.&lt;/p&gt;

&lt;p&gt;The choice is visible in the Terraform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The provider runs on &lt;code&gt;arm64&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Allowed instance types are constrained to &lt;code&gt;m6g.large&lt;/code&gt;, &lt;code&gt;m6g.xlarge&lt;/code&gt;, &lt;code&gt;m7g.large&lt;/code&gt;, and &lt;code&gt;m7g.xlarge&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Scaling is set to &lt;code&gt;Auto&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The maximum pool ceiling is set to &lt;code&gt;64&lt;/code&gt; vCPU&lt;/li&gt;
&lt;li&gt;The capacity provider is placed in the default VPC, with unsupported Availability Zones filtered out&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point matters more than it first appears. The code explicitly excludes unsupported AZs such as &lt;code&gt;us-east-1e&lt;/code&gt;, which is a good example of operational maturity: the happy path is not enough when the service itself has placement constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  How We Actually Created the Capacity Provider
&lt;/h2&gt;

&lt;p&gt;One thing I wanted this platform to avoid was "concept architecture" with no implementation backbone. So the capacity provider here is not described abstractly. It is provisioned directly in Terraform and wired into the Lambda lifecycle in a fairly intentional way.&lt;/p&gt;

&lt;p&gt;The build starts in &lt;code&gt;terraform_file/agent_core_sync_cp.tf&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;First, the capacity provider itself is created with &lt;code&gt;aws_lambda_capacity_provider&lt;/code&gt;. The naming pattern ties it to the service and environment, which is the right instinct for multi-environment operation. The provider is tagged as shared compute for agent workloads, which matters later for discoverability and platform governance.&lt;/p&gt;

&lt;p&gt;Second, the provider is placed inside the default VPC, but not blindly. In &lt;code&gt;terraform_file/data.tf&lt;/code&gt;, the code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;discovers the default VPC&lt;/li&gt;
&lt;li&gt;fetches the default subnets&lt;/li&gt;
&lt;li&gt;inspects subnet Availability Zones one by one&lt;/li&gt;
&lt;li&gt;excludes unsupported zones such as &lt;code&gt;us-east-1e&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;optionally caps how many subnets are used&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a subtle but important design choice. Lambda Managed Instances often create one placement footprint per subnet or AZ. If you do not control subnet spread, you can end up creating more infrastructure surface area than you intended.&lt;/p&gt;

&lt;p&gt;Third, the provider uses a dedicated security group rather than inheriting something vague and accidental. The current implementation keeps outbound traffic fully open and allows inbound HTTPS. That is permissive, but it is at least explicit and repeatable. Early-stage platforms benefit from that kind of clarity.&lt;/p&gt;

&lt;p&gt;Fourth, the capacity provider gets its own operator role through &lt;code&gt;AWSLambdaManagedEC2ResourceOperator&lt;/code&gt;. That is a critical detail. Capacity providers are not just Lambda resources; they need AWS to manage the EC2-backed execution infrastructure on your behalf. If you miss that role, the platform does not really exist no matter how nice your Terraform looks.&lt;/p&gt;

&lt;p&gt;Fifth, the instance requirements are opinionated. The code forces &lt;code&gt;arm64&lt;/code&gt; and narrows the fleet to supported Graviton M-family instance types. That is one of the better engineering decisions in this implementation because it converts an architectural preference into an enforceable runtime rule.&lt;/p&gt;

&lt;p&gt;Finally, the Lambda function is attached to the capacity provider in &lt;code&gt;terraform_file/lambda_clm_router_agent.tf&lt;/code&gt; through &lt;code&gt;capacity_provider_config&lt;/code&gt;. That is where the abstraction becomes real. We are not just provisioning a pool and hoping someone uses it later. We are explicitly binding a published Lambda to that pool and tuning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;memory GiB per vCPU&lt;/li&gt;
&lt;li&gt;max concurrency per execution environment&lt;/li&gt;
&lt;li&gt;ARM64 runtime alignment&lt;/li&gt;
&lt;li&gt;published versioning through Lambda aliases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the full loop: provision shared compute, constrain placement, grant AWS the operator role it needs, attach live functions to the pool, and then manage the resulting version sprawl with automation. That is what makes this feel like a platform artifact rather than a loose Terraform experiment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 1: A Capacity Provider Is Not a Tuning Knob. It Is an Operating Model.
&lt;/h2&gt;

&lt;p&gt;Teams often talk about capacity providers as if they are just a performance optimization. That framing is too shallow.&lt;/p&gt;

&lt;p&gt;The moment you move Lambda onto managed instances, you are no longer only buying faster startup. You are adopting a new operating model with very clear implications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You now care about instance family compatibility&lt;/li&gt;
&lt;li&gt;You need to think about subnet strategy and AZ support&lt;/li&gt;
&lt;li&gt;You have to reason about pool scaling ceilings, concurrency, and memory per vCPU&lt;/li&gt;
&lt;li&gt;You are effectively blending serverless ergonomics with infrastructure accountability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This implementation shows that transition clearly. The CLM router Lambda is not just declared with a runtime and handler. It is attached to the shared capacity provider and explicitly tuned with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;execution_environment_memory_gib_per_vcpu&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;per_execution_environment_max_concurrency&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;publish = true&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;architectures = ["arm64"]&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the tell. Once we start specifying how execution environments should behave, we are no longer simply "deploying a Lambda." We are shaping compute economics.&lt;/p&gt;

&lt;p&gt;The practical lesson here is simple: if you adopt Lambda Managed Instances, treat it like platform engineering, not like a runtime checkbox.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 2: ARM64 Delivers Real Value, but Only if You Respect Service Constraints
&lt;/h2&gt;

&lt;p&gt;One of the strongest decisions in this implementation is the bias toward Graviton. For Python-heavy agent workloads, ARM64 is usually the right default. The economics are better, and the performance-per-dollar story is often compelling.&lt;/p&gt;

&lt;p&gt;But there is an important nuance that the Terraform comments correctly capture: not every EC2 family you might expect is supported in the way you assume. This implementation explicitly avoids unsupported combinations and narrows the fleet to supported M-family Graviton instances.&lt;/p&gt;

&lt;p&gt;That is a good lesson in cloud architecture generally: cloud products market flexibility, but production systems survive on constraint management.&lt;/p&gt;

&lt;p&gt;The teams that do well with modern AWS services are not the ones that assume every SKU works. They are the ones that encode the service's real boundaries in Terraform so no one has to rediscover them during an incident window.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 3: Warmup Is Not a Hack. It Is a Deliberate Control Loop.
&lt;/h2&gt;

&lt;p&gt;There is a tendency in engineering circles to treat "warming" as a slightly embarrassing workaround. I think that is the wrong mindset.&lt;/p&gt;

&lt;p&gt;This implementation schedules the CLM router Lambda every five minutes through EventBridge. The handler itself is intentionally lightweight and effectively acts as a keep-alive mechanism. That is not laziness. It is an explicit decision to keep the shared pool alive for latency-sensitive traffic.&lt;/p&gt;

&lt;p&gt;More specifically, the warmer exists to reduce the probability that the capacity provider has to spin up fresh managed instance capacity for a new invocation path after a quiet period. That is the practical point of the EventBridge rule in &lt;code&gt;terraform_file/eventbridge_cp_arm.tf&lt;/code&gt;. By invoking the Lambda on a steady &lt;code&gt;rate(5 minutes)&lt;/code&gt; schedule, the platform keeps the execution path warm enough that the shared capacity provider is less likely to fall all the way back to a cold, scale-from-zero posture right before a real request arrives.&lt;/p&gt;

&lt;p&gt;The important insight is this: once you care about cold-start predictability, you need a control loop.&lt;/p&gt;

&lt;p&gt;That control loop can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provisioned concurrency&lt;/li&gt;
&lt;li&gt;Scheduled warmers&lt;/li&gt;
&lt;li&gt;Request shaping&lt;/li&gt;
&lt;li&gt;A shared managed instance pool&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this design, the team chose scheduled warm invocation plus a shared capacity provider. That is a sensible middle ground. It is cheaper and simpler than overcommitting always-on infrastructure, while still materially reducing the first-hit penalty.&lt;/p&gt;

&lt;p&gt;In plain English: the EventBridge warmer is being used here so the capacity provider does not need to spin up a brand-new server footprint every time traffic reappears after idle time. For interactive or latency-sensitive agent workloads, that is a very practical optimization.&lt;/p&gt;

&lt;p&gt;The strategic lesson is that warmup should be measured against business latency, not ideological purity. If a five-minute EventBridge schedule protects user experience and keeps cost acceptable, it is doing its job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 4: Shared Pools Create Efficiency, but They Also Create Coupling
&lt;/h2&gt;

&lt;p&gt;The capacity provider here is intentionally shared across platform agents and automation services. That is the right move early in a platform journey because it improves utilization and prevents every Lambda from inventing its own isolated infrastructure story.&lt;/p&gt;

&lt;p&gt;But shared pools always introduce two forms of coupling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Technical coupling, because multiple workloads compete for the same execution substrate&lt;/li&gt;
&lt;li&gt;Organizational coupling, because one team's deployment patterns can affect another team's cost and performance envelope&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why the concurrency controls here matter. The CLM router function uses a per-execution-environment concurrency setting, and the environment-specific &lt;code&gt;.tfvars&lt;/code&gt; files pin that concurrency to &lt;code&gt;4&lt;/code&gt;. That is more than a performance number. It is a fairness policy.&lt;/p&gt;

&lt;p&gt;If I were advising a platform team scaling this pattern, I would say this clearly: shared capacity providers are excellent, but they need quota thinking from day one. Otherwise the first successful workload becomes the first noisy neighbor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 5: If You Publish Versions Aggressively, You Need Lifecycle Hygiene on Day One
&lt;/h2&gt;

&lt;p&gt;This implementation makes another good call: the Lambda functions are published, aliased, and then cleaned up with an automated version pruner.&lt;/p&gt;

&lt;p&gt;That matters because version sprawl is one of those quiet operational problems that teams ignore until it becomes annoying enough to disrupt deployments. Published versions accumulate quickly when CI/CD is active. If you do not manage them, you eventually pay in clutter, confusion, or hard service limits.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;lambda_version_pruner&lt;/code&gt; implementation is stronger than a simplistic cleanup script because it preserves what actually matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It scans all Lambda functions&lt;/li&gt;
&lt;li&gt;It filters only functions associated with the target capacity provider&lt;/li&gt;
&lt;li&gt;It lists all aliases and protects aliased versions&lt;/li&gt;
&lt;li&gt;It keeps the latest N published versions&lt;/li&gt;
&lt;li&gt;It deletes everything older that is neither current nor aliased&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly the kind of automation mature teams invest in. Not glamorous. Very valuable.&lt;/p&gt;

&lt;p&gt;There is also an understated platform principle here: rollback is not just about keeping artifacts. It is about keeping the right artifacts. By preserving aliased versions, the pruner respects deployment intent rather than blindly optimizing for tidiness.&lt;/p&gt;

&lt;p&gt;There is also a more practical capacity-provider reason for doing this, and it deserves to be stated directly.&lt;/p&gt;

&lt;p&gt;When you run a shared Lambda Managed Instances pool, you want the platform to spend its effort on the versions that are actually serving traffic, warming correctly, or remaining available for safe rollback. If old published versions keep accumulating forever, three unhealthy things tend to happen:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;operators lose clarity on which versions are still meaningful&lt;/li&gt;
&lt;li&gt;rollback and alias management become noisier than they should be&lt;/li&gt;
&lt;li&gt;the shared platform carries more deployment residue than useful runtime intent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Strictly speaking, deleting old Lambda versions does not magically increase CPU on the capacity provider. What it does do is improve platform hygiene around the shared pool. It ensures that the versions attached to aliases, warmup patterns, and deployment workflows remain deliberate and limited. In other words, it improves capacity-provider utilization indirectly by reducing version sprawl around the workloads that consume that shared capacity.&lt;/p&gt;

&lt;p&gt;That matters in real operations. The healthier the deployment surface is, the easier it is to reason about what is warming, what is active, what can be rolled back, and what should no longer influence the platform at all.&lt;/p&gt;

&lt;p&gt;So the version pruner is not just a cleanup utility. It is part of making the shared capacity provider operationally efficient. Not by adding raw compute, but by reducing noise, protecting the versions that matter, and keeping the platform focused on live execution paths instead of historical leftovers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 6: GitHub Actions Should Orchestrate. CodeBuild Should Execute.
&lt;/h2&gt;

&lt;p&gt;Architecturally, the CI/CD model here is sensible.&lt;/p&gt;

&lt;p&gt;GitHub Actions is used as the control plane:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;branch-based triggering&lt;/li&gt;
&lt;li&gt;security scanning&lt;/li&gt;
&lt;li&gt;environment selection&lt;/li&gt;
&lt;li&gt;AWS credential injection&lt;/li&gt;
&lt;li&gt;build orchestration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS CodeBuild is used as the execution plane:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Terraform install&lt;/li&gt;
&lt;li&gt;&lt;code&gt;terraform init&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;terraform validate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;terraform plan&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;terraform apply&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I like this split. It keeps GitHub Actions lightweight and makes AWS the place where the actual infrastructure mutation happens. That usually gives better access control, cleaner auditability, and fewer surprises around long-running plan or apply steps.&lt;/p&gt;

&lt;p&gt;The buildspecs pin Terraform &lt;code&gt;1.12.2&lt;/code&gt;, install the CLI explicitly, and then execute plan/apply flows with environment-specific variable files. That is exactly the kind of boring repeatability you want in infrastructure delivery.&lt;/p&gt;

&lt;p&gt;This is one of the most practical lessons from the implementation: do not force GitHub Actions to be your full deployment runtime if AWS-native execution gives you better control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 7: CI/CD Maturity Is Not About Having a Pipeline. It Is About Where the Gates Actually Are.
&lt;/h2&gt;

&lt;p&gt;The implementation also reveals a harder truth: CI/CD design is won or lost not by YAML volume, but by trigger discipline.&lt;/p&gt;

&lt;p&gt;There are some good instincts here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dev deployment is chained off a successful security workflow&lt;/li&gt;
&lt;li&gt;Security scanning runs on push and PR for &lt;code&gt;dev&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;PR security review is scoped only to actual code and infrastructure changes&lt;/li&gt;
&lt;li&gt;Environment-specific secrets are used for AWS access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That said, the current implementation also shows the kinds of issues every fast-moving team encounters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The dev deploy workflow is triggered by &lt;code&gt;Security Checks (Push)&lt;/code&gt;, not by a broader quality gate such as tests plus security plus static analysis&lt;/li&gt;
&lt;li&gt;The QA workflow is currently triggered on &lt;code&gt;pull_request&lt;/code&gt; to &lt;code&gt;qa&lt;/code&gt;, yet it also includes an apply stage, which is a risky combination&lt;/li&gt;
&lt;li&gt;The sanity workflow references a different CodeBuild project naming pattern, which looks like copy-forward drift from another implementation&lt;/li&gt;
&lt;li&gt;One dev apply step mixes generic and environment-specific secrets in a way that deserves tightening&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a criticism of the team. It is actually the most authentic part of the system.&lt;/p&gt;

&lt;p&gt;Real pipelines evolve through reuse, renaming, urgency, and partial migration. The useful engineering habit is not pretending they are pristine. It is recognizing that pipeline drift is itself a production concern.&lt;/p&gt;

&lt;p&gt;My blunt lesson here is this: CI/CD is software. It needs the same review rigor as application code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 8: Documentation Drift Is a Reliability Signal
&lt;/h2&gt;

&lt;p&gt;The README here is ambitious and useful, but parts of it clearly describe a broader or earlier architecture than the exact files currently present. That mismatch is more important than most teams realize.&lt;/p&gt;

&lt;p&gt;When documentation and implementation diverge, three things happen:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;new engineers learn the wrong system&lt;/li&gt;
&lt;li&gt;reviewers approve changes with outdated mental models&lt;/li&gt;
&lt;li&gt;incidents take longer to resolve because operators trust stale diagrams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of the best engineering habits is to treat documentation drift as an operational bug, not as a cosmetic issue.&lt;/p&gt;

&lt;p&gt;This implementation makes that case well. The code is the source of truth. The docs are directionally strong, but some names, workflow descriptions, and file references have clearly moved over time. That is normal. What matters is catching it before the next engineer builds decisions on old assumptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 9: The Default VPC Is Fine for Speed, but It Should Be a Conscious Temporary Convenience
&lt;/h2&gt;

&lt;p&gt;The Terraform intentionally uses the default VPC and default subnets, then layers in filtering and a custom security group. For early velocity, that is an acceptable choice. It removes friction and makes the first deployment much easier.&lt;/p&gt;

&lt;p&gt;But teams should be honest about the tradeoff.&lt;/p&gt;

&lt;p&gt;Using the default VPC accelerates setup. It does not provide the same clarity, segmentation, or policy hygiene that a dedicated workload VPC eventually should. The inbound HTTPS rule from &lt;code&gt;0.0.0.0/0&lt;/code&gt; is another example of where a practical early-stage decision should later be revisited with a more opinionated security posture.&lt;/p&gt;

&lt;p&gt;My view is simple: default VPC usage is fine when it is a speed decision. It becomes dangerous when it silently hardens into architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 10: Least Privilege Usually Loses the First Battle. Do Not Let It Lose the War.
&lt;/h2&gt;

&lt;p&gt;The Lambda IAM policy for the router function is broad. Very broad.&lt;/p&gt;

&lt;p&gt;That is common when a platform team is trying to unblock integration work quickly across S3, SQS, SNS, DynamoDB, Bedrock, AppSync, logs, X-Ray, and secrets. The version pruner is noticeably tighter, which is encouraging. But the broader pattern remains familiar: the first version of a system usually over-grants.&lt;/p&gt;

&lt;p&gt;The lesson is not "never do that." The lesson is "know when you are doing it, and schedule the hardening work while the platform is still comprehensible."&lt;/p&gt;

&lt;p&gt;Security debt compounds. The longer a wide-open policy survives, the more invisible it becomes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Repo Gets Right
&lt;/h2&gt;

&lt;p&gt;If I strip away the drift and focus on the platform instincts, this implementation gets a lot right:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It treats capacity provider infrastructure as shared platform capability, not one-off function plumbing&lt;/li&gt;
&lt;li&gt;It optimizes for ARM64 economics instead of defaulting to x86 out of habit&lt;/li&gt;
&lt;li&gt;It acknowledges cold starts as a business problem and addresses them operationally&lt;/li&gt;
&lt;li&gt;It preserves rollback safety with aliases while still pruning version sprawl&lt;/li&gt;
&lt;li&gt;It separates orchestration from execution in CI/CD&lt;/li&gt;
&lt;li&gt;It encodes AWS service constraints in Terraform comments and defaults, which reduces tribal knowledge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a strong foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Would Improve Next
&lt;/h2&gt;

&lt;p&gt;If I were turning this into the next version of a production-grade internal platform, I would prioritize the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Tighten naming consistency across the implementation.&lt;br&gt;
The capacity provider name appears in slightly different forms across resources. That is how automation misses its target. Shared naming locals should eliminate this class of error.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make QA and production promotion rules stricter.&lt;br&gt;
A PR-triggered apply path should be removed. Plan on PR, apply on protected branch or approved environment gate is the cleaner model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run Terraform from a single explicit working directory.&lt;br&gt;
The current layout places Terraform under &lt;code&gt;terraform_file/&lt;/code&gt;, while some buildspec commands read like root-level execution. That ambiguity should be eliminated.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Move from broad IAM toward intent-based policies.&lt;br&gt;
Especially for the router Lambda, policy scope should narrow as the workload stabilizes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Revisit networking posture.&lt;br&gt;
The default VPC is fine for speed; a dedicated VPC model is better for longevity, auditability, and controlled ingress.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add stronger deployment quality gates.&lt;br&gt;
Security review is useful, but infrastructure promotion should also hang off validation, tests, linting, and explicit approval where appropriate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add platform observability as code.&lt;br&gt;
CloudWatch alarms, dashboarding, and cost visibility for the capacity provider should be treated as first-class Terraform resources, not follow-up tasks.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Bigger Technical Lesson
&lt;/h2&gt;

&lt;p&gt;The biggest takeaway from this implementation is not about Lambda specifically.&lt;/p&gt;

&lt;p&gt;It is about how modern platform teams should build.&lt;/p&gt;

&lt;p&gt;We should absolutely chase better cost-performance curves. We should use managed primitives aggressively. We should automate the boring work. But we also need the discipline to encode what we learn while the system is still small enough to reason about.&lt;/p&gt;

&lt;p&gt;What makes this useful is that it shows both halves of real engineering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the architectural intent&lt;/li&gt;
&lt;li&gt;the implementation scars&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination is where credible engineering judgment comes from.&lt;/p&gt;

&lt;p&gt;Anyone can present a clean target state. The harder and more useful skill is building systems that survive contact with deployment friction, service constraints, naming drift, and operational reality.&lt;/p&gt;

&lt;p&gt;That is what this implementation is doing. And that is why the lessons here matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Thought
&lt;/h2&gt;

&lt;p&gt;Capacity providers, warmers, version pruning, and GitHub-driven delivery are not separate topics. They are all answers to the same technical question:&lt;/p&gt;

&lt;p&gt;How do we make cloud systems faster, cheaper, safer, and more repeatable without turning every application team into a specialized infrastructure group?&lt;/p&gt;

&lt;p&gt;In this implementation, the answer was to centralize the hard platform decisions, automate the hygiene, keep the runtime warm where it matters, and stay honest about the places where the system still needs tightening.&lt;/p&gt;

&lt;p&gt;That is not just good infrastructure work.&lt;/p&gt;

&lt;p&gt;That is good engineering practice.&lt;/p&gt;

</description>
      <category>serverless</category>
      <category>lambda</category>
      <category>aws</category>
    </item>
    <item>
      <title>Lessons I learned building a memory-aware agent with Amazon Bedrock AgentCore Runtime</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 20 Apr 2026 18:10:16 +0000</pubDate>
      <link>https://dev.to/amitkayal/lessons-i-learned-building-a-memory-aware-agent-with-amazon-bedrock-agentcore-runtime-4lc9</link>
      <guid>https://dev.to/amitkayal/lessons-i-learned-building-a-memory-aware-agent-with-amazon-bedrock-agentcore-runtime-4lc9</guid>
      <description>&lt;h1&gt;
  
  
  Lessons I learned building a memory-aware agent with Amazon Bedrock AgentCore Runtime
&lt;/h1&gt;

&lt;p&gt;When I started building an agent with Amazon Bedrock AgentCore Runtime, I thought the difficult parts would be model selection, tool wiring, and deployment. Those certainly mattered, but the part that shaped the quality of the agent most was memory.&lt;/p&gt;

&lt;p&gt;The first version of the agent could answer single prompts well enough, but it did not behave like a real multi-turn system. Follow-up questions were brittle. The agent lost short-range intent. Tool usage worked, but only within the narrow boundaries of the current prompt. As soon as the conversation depended on what happened one or two turns earlier, the system started to feel less like an agent and more like a stateless inference endpoint.&lt;/p&gt;

&lt;p&gt;That experience changed how I approached the design. I stopped thinking about memory as a convenience feature and started treating it as part of the runtime architecture itself. This article is a distillation of the most important lessons I learned while building a short-term-memory-aware agent with Amazon Bedrock AgentCore Runtime and Strands.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 1: An agent is not really multi-turn until memory is part of the lifecycle
&lt;/h2&gt;

&lt;p&gt;One of the first things I learned is that conversational continuity does not emerge automatically just because the application calls the same runtime repeatedly.&lt;/p&gt;

&lt;p&gt;Without short-term memory, the agent only sees the current prompt unless the application keeps reconstructing and replaying history manually. That creates several problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;previous instructions are easy to lose,&lt;/li&gt;
&lt;li&gt;tool chains become fragile across turns,&lt;/li&gt;
&lt;li&gt;users have to restate identifiers and intent,&lt;/li&gt;
&lt;li&gt;the system becomes increasingly prompt-shaped rather than interaction-shaped.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What became clear to me is that short-term memory is not about storing everything forever. It is about preserving enough recent state for the current conversation to remain coherent.&lt;/p&gt;

&lt;p&gt;That distinction matters. I was not trying to build a knowledge base or semantic fact store. I was trying to answer a simpler question: how do I help the agent remember what we were just doing?&lt;/p&gt;

&lt;p&gt;Once I framed the problem that way, the architecture became much clearer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 2: The cleanest pattern is explicit memory, not implicit transcript magic
&lt;/h2&gt;

&lt;p&gt;Another lesson I learned quickly is that I did not want memory to be hidden behind vague runtime behavior. I wanted the agent code to make memory use explicit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;where memory comes from,&lt;/li&gt;
&lt;li&gt;when it is read,&lt;/li&gt;
&lt;li&gt;when it is written,&lt;/li&gt;
&lt;li&gt;which user it belongs to,&lt;/li&gt;
&lt;li&gt;which conversation it belongs to.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That led me to a pattern built around &lt;code&gt;MemoryClient&lt;/code&gt; and hooks.&lt;/p&gt;

&lt;p&gt;Instead of treating memory like a passive transcript that somehow appears at the edge of the request, I found it much more reliable to think about it as a lifecycle-managed dependency:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;create a short-term memory resource,&lt;/li&gt;
&lt;li&gt;pass the memory identity into the runtime,&lt;/li&gt;
&lt;li&gt;read recent turns when the agent initializes,&lt;/li&gt;
&lt;li&gt;write new messages as events when the conversation changes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The important shift for me was this: memory worked best when it was part of the agent object model, not just part of request handling glue code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 3: Hooks are where memory belongs
&lt;/h2&gt;

&lt;p&gt;This was probably the biggest implementation insight.&lt;/p&gt;

&lt;p&gt;Once I had a Strands-based agent running inside AgentCore Runtime, I needed to decide where the memory logic should live. I could have put everything directly into the entrypoint and manually stitched together request parsing, history retrieval, message persistence, and prompt injection. That would have worked, but it would have made the agent lifecycle harder to reason about.&lt;/p&gt;

&lt;p&gt;What worked better was using hooks tied to the agent lifecycle itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;AgentInitializedEvent&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;MessageAddedEvent&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That structure gave me a much cleaner mental model.&lt;/p&gt;

&lt;p&gt;On initialization, the agent needs context before it reasons. That is the right moment to retrieve the most recent turns from memory and inject them into prompt context.&lt;/p&gt;

&lt;p&gt;When a new message is added, the conversation state has changed. That is the right moment to persist the latest user or assistant message back into memory.&lt;/p&gt;

&lt;p&gt;The core interaction looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;recent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_last_k_turns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What I like about this model is that it is deterministic.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;memory load happens before reasoning,&lt;/li&gt;
&lt;li&gt;memory write happens when conversation state changes,&lt;/li&gt;
&lt;li&gt;both operations use the same identity boundaries,&lt;/li&gt;
&lt;li&gt;the entrypoint stays focused on request extraction rather than conversation orchestration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That made the system easier to debug, easier to extend, and much easier to explain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 4: Identity is the real memory boundary
&lt;/h2&gt;

&lt;p&gt;Before building this, I thought of memory mostly as a storage problem. In practice, I learned it is just as much an identity problem.&lt;/p&gt;

&lt;p&gt;The two identifiers that mattered most were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;actor_id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;session_id&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation ended up being foundational.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why &lt;code&gt;actor_id&lt;/code&gt; matters
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;actor_id&lt;/code&gt; is the user boundary. If that identifier is unstable, absent, or inconsistent, memory quality degrades immediately.&lt;/p&gt;

&lt;p&gt;What I learned is that a memory system is only as good as the application identity you feed into it. If the same user appears under multiple IDs, the agent cannot retrieve a coherent conversational history. If two users are accidentally mapped to the same identity, memory becomes unsafe.&lt;/p&gt;

&lt;p&gt;So one of my strongest takeaways is that &lt;code&gt;actor_id&lt;/code&gt; should always come from a stable authenticated user identity, not from an incidental client-generated value.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why &lt;code&gt;session_id&lt;/code&gt; matters
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;session_id&lt;/code&gt; turned out to be just as important. A single user does not have just one conversation. They may have multiple active threads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one troubleshooting flow,&lt;/li&gt;
&lt;li&gt;one transcript analysis request,&lt;/li&gt;
&lt;li&gt;one abandoned conversation from earlier,&lt;/li&gt;
&lt;li&gt;one brand-new task.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without a session boundary, all of that collapses into one memory stream. The agent might technically “remember,” but it remembers too much of the wrong thing.&lt;/p&gt;

&lt;p&gt;That was a key lesson for me: useful memory is not just preserved memory. It is correctly scoped memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 5: The agent should be rebuilt per request, but memory should persist across requests
&lt;/h2&gt;

&lt;p&gt;This was an architectural point that became clearer as I implemented the runtime flow.&lt;/p&gt;

&lt;p&gt;The Strands agent instance itself is created per request. That makes sense because each invocation carries request-specific state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the current user prompt,&lt;/li&gt;
&lt;li&gt;the active user identity,&lt;/li&gt;
&lt;li&gt;the active conversation session,&lt;/li&gt;
&lt;li&gt;the active tool and runtime context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But memory should not behave like request-local state. Memory has to outlive the agent instance and remain keyed to the same user and conversation across invocations.&lt;/p&gt;

&lt;p&gt;That split was important for me to internalize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;agent instance lifecycle is short,&lt;/li&gt;
&lt;li&gt;conversation memory lifecycle is longer,&lt;/li&gt;
&lt;li&gt;the link between them is established through state and hooks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once I started thinking in those terms, the design felt much more natural.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 6: Deployment is part of the memory design
&lt;/h2&gt;

&lt;p&gt;I originally thought of deployment as a separate concern from conversational behavior. Building this agent convinced me that the two are tightly connected.&lt;/p&gt;

&lt;p&gt;The runtime needs to know which memory resource it should use, but I did not want that decision hardcoded in application logic. The better pattern was to resolve the correct memory resource during deployment and pass that identity into the runtime as configuration.&lt;/p&gt;

&lt;p&gt;In practice, that meant the runtime received environment-specific values such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AGENT_NAME=&amp;lt;agent-name&amp;gt;
MEMORY_ID=&amp;lt;memory-id&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gave me a few benefits immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the same application code could move across environments,&lt;/li&gt;
&lt;li&gt;memory resources stayed aligned with environment boundaries,&lt;/li&gt;
&lt;li&gt;the runtime remained configurable without source changes,&lt;/li&gt;
&lt;li&gt;the control plane remained the primary place where resource binding happened.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of the clearest lessons here is that memory should be treated like any other environment-bound infrastructure dependency. If it is not part of deployment, it tends to become a hidden assumption.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 7: Short-term memory and long-term memory solve different problems
&lt;/h2&gt;

&lt;p&gt;I found it helpful to stop using the word “memory” as if it meant one thing.&lt;/p&gt;

&lt;p&gt;Short-term memory answered the question:&lt;/p&gt;

&lt;p&gt;"What was happening in this conversation recently?"&lt;/p&gt;

&lt;p&gt;Long-term memory answers a different question:&lt;/p&gt;

&lt;p&gt;"What durable information should the system remember beyond this immediate interaction?"&lt;/p&gt;

&lt;p&gt;For the agent I was building, the short-term problem came first. I needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;recent-turn continuity,&lt;/li&gt;
&lt;li&gt;bounded replay,&lt;/li&gt;
&lt;li&gt;session-scoped context,&lt;/li&gt;
&lt;li&gt;predictable event retention.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I did not need semantic fact retrieval in the first phase. I did not need vector search for historical knowledge. I needed the agent to remain coherent across adjacent turns.&lt;/p&gt;

&lt;p&gt;That was an important design simplification. It kept the first version of the memory architecture focused on event continuity instead of overextending into knowledge retrieval prematurely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 8: Recent-turn replay should be bounded
&lt;/h2&gt;

&lt;p&gt;Once I had memory retrieval working, the next question was how much of it to inject back into the agent context.&lt;/p&gt;

&lt;p&gt;My lesson here was simple: more memory is not always better memory.&lt;/p&gt;

&lt;p&gt;If too much prior conversation is replayed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt size grows,&lt;/li&gt;
&lt;li&gt;token cost grows,&lt;/li&gt;
&lt;li&gt;stale context starts competing with the current task,&lt;/li&gt;
&lt;li&gt;reasoning quality can actually decline.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I found the most practical pattern was to retrieve the last few turns and inject them into prompt context in a compact representation. In this design, that replay window was bounded at five turns.&lt;/p&gt;

&lt;p&gt;That gave me a good balance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;enough recent context for continuity,&lt;/li&gt;
&lt;li&gt;small enough context for predictable prompt growth,&lt;/li&gt;
&lt;li&gt;simple enough formatting to inspect and debug.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This also reinforced another lesson: short-term memory should be operationally understandable. I want to know what context the model saw, not just trust that some opaque memory layer handled it correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 9: Memory becomes more valuable when tools are involved
&lt;/h2&gt;

&lt;p&gt;The agent I built was not just a conversational shell. It had tools, including domain-specific behavior such as transcript retrieval and AWS interactions.&lt;/p&gt;

&lt;p&gt;That is where the value of short-term memory became even more obvious.&lt;/p&gt;

&lt;p&gt;In a tool-using workflow, the user often does not repeat the full context every turn. They say things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"use the same meeting"&lt;/li&gt;
&lt;li&gt;"what did the second speaker say?"&lt;/li&gt;
&lt;li&gt;"now summarize that"&lt;/li&gt;
&lt;li&gt;"check the S3 output from before"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without memory, the agent has to reconstruct working state from a single prompt. With memory, the agent has a much better chance of preserving:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the active object under discussion,&lt;/li&gt;
&lt;li&gt;the prior user instruction,&lt;/li&gt;
&lt;li&gt;the last tool result,&lt;/li&gt;
&lt;li&gt;the intended next step.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of my strongest takeaways is that memory is not just a conversational improvement. It is a workflow improvement. It makes tool orchestration across turns materially more coherent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 10: Failure modes need to be designed, not discovered in production
&lt;/h2&gt;

&lt;p&gt;Building this also made me think much more carefully about degraded behavior.&lt;/p&gt;

&lt;p&gt;If memory resolution fails and the runtime cannot find a memory resource, the agent may still run. That sounds convenient, but it also means the system may silently shift from stateful to stateless behavior.&lt;/p&gt;

&lt;p&gt;That taught me to treat the following as first-class operational conditions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;memory enabled,&lt;/li&gt;
&lt;li&gt;memory disabled,&lt;/li&gt;
&lt;li&gt;memory load succeeded,&lt;/li&gt;
&lt;li&gt;memory write succeeded,&lt;/li&gt;
&lt;li&gt;memory resolution failed,&lt;/li&gt;
&lt;li&gt;identity inputs were missing or malformed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The same thing applies to identity mistakes.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;actor_id&lt;/code&gt; is unstable, memory becomes fragmented.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;session_id&lt;/code&gt; is reused incorrectly, unrelated conversations bleed into each other.&lt;/p&gt;

&lt;p&gt;If replay windows grow without discipline, prompt quality degrades.&lt;/p&gt;

&lt;p&gt;These are not edge cases. They are part of the normal operating surface of a memory-aware agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 11: Retention, privacy, and compliance show up earlier than expected
&lt;/h2&gt;

&lt;p&gt;Short-term memory sounds lightweight, but it is still stored interaction data.&lt;/p&gt;

&lt;p&gt;That means retention policy is not just a platform setting. It is part of the product design. While building this, I became much more aware that memory decisions quickly intersect with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;data handling policy,&lt;/li&gt;
&lt;li&gt;privacy expectations,&lt;/li&gt;
&lt;li&gt;deletion and retention requirements,&lt;/li&gt;
&lt;li&gt;security review,&lt;/li&gt;
&lt;li&gt;production observability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The technical implementation can be elegant, but if these operational questions are not addressed early, the design will be incomplete.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 12: AgentCore became more useful to me when I treated it as a runtime system, not just a hosting target
&lt;/h2&gt;

&lt;p&gt;This may be the broadest lesson of all.&lt;/p&gt;

&lt;p&gt;At first, I thought of AgentCore Runtime mainly as the place where the agent container would run. But while building with memory, I started appreciating it more as a runtime environment with clear operational boundaries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the runtime executes the agent,&lt;/li&gt;
&lt;li&gt;the framework manages reasoning and tools,&lt;/li&gt;
&lt;li&gt;the memory plane manages event continuity,&lt;/li&gt;
&lt;li&gt;the deployment workflow binds the right resources together.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That view helped me move beyond “deploy a model wrapper in a container” toward “operate an agent system with state, identity, and lifecycle.”&lt;/p&gt;

&lt;p&gt;For me, that was the real shift.&lt;/p&gt;

&lt;h2&gt;
  
  
  The technical pattern I would reuse
&lt;/h2&gt;

&lt;p&gt;If I were building the same class of agent again, I would reuse the same high-level pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a dedicated short-term memory resource.&lt;/li&gt;
&lt;li&gt;Resolve the correct memory resource during deployment.&lt;/li&gt;
&lt;li&gt;Pass memory identity into the runtime explicitly.&lt;/li&gt;
&lt;li&gt;Build the agent per request with user and session state.&lt;/li&gt;
&lt;li&gt;Load recent turns during agent initialization.&lt;/li&gt;
&lt;li&gt;Persist new messages when they are added.&lt;/li&gt;
&lt;li&gt;Keep replay windows bounded.&lt;/li&gt;
&lt;li&gt;Treat &lt;code&gt;actor_id&lt;/code&gt; and &lt;code&gt;session_id&lt;/code&gt; as core correctness boundaries.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I would also keep the same mental model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;short-term memory is for continuity,&lt;/li&gt;
&lt;li&gt;long-term memory is for durable recall,&lt;/li&gt;
&lt;li&gt;hooks are the right place for memory orchestration,&lt;/li&gt;
&lt;li&gt;deployment is part of memory architecture,&lt;/li&gt;
&lt;li&gt;observability should make degraded memory behavior visible.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;The biggest lesson I learned while building with Amazon Bedrock AgentCore Runtime is that memory is not something you sprinkle onto an agent once the rest of the system works. Memory changes the shape of the system.&lt;/p&gt;

&lt;p&gt;It affects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;request lifecycle,&lt;/li&gt;
&lt;li&gt;identity boundaries,&lt;/li&gt;
&lt;li&gt;prompt construction,&lt;/li&gt;
&lt;li&gt;deployment,&lt;/li&gt;
&lt;li&gt;observability,&lt;/li&gt;
&lt;li&gt;privacy,&lt;/li&gt;
&lt;li&gt;and tool coherence across turns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once I accepted that, the architecture became much more disciplined. The agent became easier to reason about, easier to operate, and much more capable in real multi-turn interactions.&lt;/p&gt;

&lt;p&gt;That is the lesson I would carry into any future AgentCore build: if the experience is meant to feel conversational, memory has to be designed as a first-class runtime concern from the beginning.&lt;/p&gt;

</description>
      <category>agentcore</category>
      <category>aws</category>
      <category>serverless</category>
      <category>agentskills</category>
    </item>
    <item>
      <title>API Gateway as Websocket</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Tue, 21 Jan 2025 07:49:42 +0000</pubDate>
      <link>https://dev.to/amitkayal/api-gateway-as-websocket-5eee</link>
      <guid>https://dev.to/amitkayal/api-gateway-as-websocket-5eee</guid>
      <description>&lt;h1&gt;
  
  
  API Gateway as websocket
&lt;/h1&gt;

&lt;h2&gt;
  
  
  API Gateway as WS Components
&lt;/h2&gt;

&lt;p&gt;Websocket provides bidirectional session aware communication between caller and receiver and a crucial component for realtime application.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Setup API Gateway for WebSocket&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a WebSocket API in the Amazon API Gateway console or through IAC.&lt;/li&gt;
&lt;li&gt;Define the WebSocket API route selection expression. Routes here are simply like a bridge to connections e.g., 

&lt;ul&gt;
&lt;li&gt;$request.body.action.&lt;/li&gt;
&lt;li&gt;Define the following WebSocket routes:&lt;/li&gt;
&lt;li&gt;$connect: Triggered when a client establishes a connection.&lt;/li&gt;
&lt;li&gt;$disconnect: Triggered when a client disconnects.&lt;/li&gt;
&lt;li&gt;Custom routes, e.g., sendMessage, to handle specific actions.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Create an Integration with AWS Lambda&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For each route ($connect, $disconnect, custom routes), integrate a Lambda function to handle the respective logic.&lt;/li&gt;
&lt;li&gt;Use the Lambda function's handler to process:

&lt;ul&gt;
&lt;li&gt;$connect: Store the connection in DynamoDB.&lt;/li&gt;
&lt;li&gt;$disconnect: Remove the connection from DynamoDB.&lt;/li&gt;
&lt;li&gt;Custom routes: Process the message and forward it to SQS.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;DynamoDB for Connection Management&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a DynamoDB table to store:

&lt;ul&gt;
&lt;li&gt;Connection ID (Primary Key).&lt;/li&gt;
&lt;li&gt;Session ID or other metadata for grouping connections.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;This table allows tracking active WebSocket connections for broadcasting messages.&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Configure SQS for Message Queue&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use an SQS FIFO queue for guaranteed order and deduplication.&lt;/li&gt;
&lt;li&gt;Messages processed in Lambda (custom routes) are sent to SQS for downstream services.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;IAM Roles and Permissions&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Assign an IAM role to the API Gateway to invoke the integrated Lambda functions.&lt;/li&gt;
&lt;li&gt;Grant Lambda permissions to read/write from DynamoDB and send messages to SQS.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Client Connection and Messaging&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use WebSocket-compatible libraries (e.g., ws in Node.js or WebSocket API in browsers) to:&lt;/li&gt;
&lt;li&gt;Establish a WebSocket connection to the API Gateway endpoint.&lt;/li&gt;
&lt;li&gt;Send and receive messages using the WebSocket protocol.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture of Websocket mechanism
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;WebSocket Client:

&lt;ul&gt;
&lt;li&gt;Initiates WebSocket connection and communicates via send() and onmessage().&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;API Gateway (WebSocket API):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manages WebSocket connections and invokes Lambda functions for defined routes.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Route Integration (Lambda Functions):&lt;br&gt;
Every route should have an integration. There are 3 types — Mock, HTTP and Lambda.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$connect: Adds connection metadata to DynamoDB.&lt;/li&gt;
&lt;li&gt;$disconnect: Removes connection metadata from DynamoDB.&lt;/li&gt;
&lt;li&gt;$default route: selected when route cant be evaluated against message&lt;/li&gt;
&lt;li&gt;Custom Routes: Processes messages to invoke integration based on message content and forwards them to SQS.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;DynamoDB:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maintains active connection records, including connectionId and associated metadata.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;SQS FIFO Queue:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Queues messages for downstream processing, ensuring delivery order and deduplication.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Downstream Services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Processes messages from SQS and performs actions like notifications, data updates, or storage.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Security
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Authentication and Authorization
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Custom Authorizer (Lambda Authorizer)&lt;br&gt;
It can only be used for the $connect route.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a Lambda Authorizer to validate custom tokens or headers sent during connection attempts.&lt;/li&gt;
&lt;li&gt;Example:

&lt;ul&gt;
&lt;li&gt;Validate a JWT token from an identity provider (e.g., Cognito, Auth0).&lt;/li&gt;
&lt;li&gt;Check the token against allowed users or roles.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Amazon Cognito:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use Amazon Cognito for user authentication.&lt;/li&gt;
&lt;li&gt;Configure API Gateway to use Cognito to validate tokens in connection requests.&lt;/li&gt;
&lt;li&gt;Best suited for applications with user pools.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Secure WebSocket Connections
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Always use the secure WebSocket protocol (wss://). API Gateway enforces HTTPS/TLS, ensuring encrypted communication.&lt;/li&gt;
&lt;li&gt;Associate a custom domain with API Gateway WebSocket endpoint. We should AWS Certificate Manager (ACM) to manage SSL/TLS certificates.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  IP Whitelisting and Blacklisting
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; IP Whitelisting and Blacklisting: We should Attach AWS WAF to API Gateway and Block/allow requests based on IP addresses or CIDR ranges. we should also use rate limit to protect from DDoS attack
### API Gateway Throttling&lt;/li&gt;
&lt;li&gt;We can Set rate and burst limits on API Gateway routes to limit the number of connections per client.&lt;/li&gt;
&lt;li&gt;We can create API keys and associate them with usage plan and then we Limit the number of allowed requests per API key&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Environment-based Access Control:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;We should always use distinct stages (e.g., dev, prod) and restrict connections to the production API through IP rules.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tools to test
&lt;/h2&gt;

&lt;p&gt;There are following tools which we can explore to test websocket.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Piesocket&lt;/li&gt;
&lt;li&gt;Postman&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>apigateway</category>
      <category>api</category>
    </item>
    <item>
      <title>S3 table &amp; S3 Metadata table</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 09 Dec 2024 18:26:23 +0000</pubDate>
      <link>https://dev.to/aws-builders/s3-table-s3-metadata-table-91i</link>
      <guid>https://dev.to/aws-builders/s3-table-s3-metadata-table-91i</guid>
      <description>&lt;h2&gt;
  
  
  Open table format and its architecture
&lt;/h2&gt;

&lt;p&gt;OpenTable formats, such as Apache Iceberg, Apache Hudi, and Delta Lake, have gained popularity in the data analytics mainly because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ACID Transactions: OpenTable formats (e.g., Apache Iceberg, Delta Lake) ensure reliable and consistent data updates, even with concurrent access.&lt;/li&gt;
&lt;li&gt;Schema Evolution: They allow seamless updates to schemas without disrupting existing pipelines, simplifying data management. metadata tracks the changes to the dataset. The files held in the Data layer are captured by the metadata files held in the Metadata layer. As the files change, the metadata files attached to them track these changes.&lt;/li&gt;
&lt;li&gt;Optimized Queries: Partitioning and indexing enable faster queries by scanning only relevant data, improving performance and cost-efficiency.&lt;/li&gt;
&lt;li&gt;Time Travel: Users can access historical versions of data for debugging, compliance, or analytics.&lt;/li&gt;
&lt;li&gt;Interoperability: These formats integrate seamlessly with big data tools like Spark, Flink, and Presto, making them versatile and widely adopted.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Open file format
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkl9mm5r6t0aqp4uy7dqa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkl9mm5r6t0aqp4uy7dqa.png" alt="img" width="750" height="588"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  S3 table
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;p&gt;Amazon S3 Table is optimized for analytics workloads. It is designed to continuously enhance query performance and reduce storage costs for tabular data. This solution looks very promising if you are working with LakeHouse architecture. It’s a new type of bucket that organizes tables as sub-resources.&lt;br&gt;
&lt;strong&gt;A new bucket type s3 table has been introduced to support this. As liked any other aws resoyrce, it has ARN, can take resource policy and as an unique feature it has dedicated endpoint.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;S3 Tables are intended explicitly for storing data in a tabular format, such as daily purchase transactions, streaming sensor data, or ad impressions. This data is organized into columns and rows like a database table.&lt;/li&gt;
&lt;li&gt;Table buckets support storing tables in the Apache Iceberg format. You can query these tables using standard SQL in query engines that support Iceberg.&lt;/li&gt;
&lt;li&gt;Read/write allowed on datafiles and metadata files. Delete and update not allowed to save data integrity.&lt;/li&gt;
&lt;li&gt;Compatible query engines include Amazon Athena, Amazon Redshift, and Apache Spark.&lt;/li&gt;
&lt;li&gt;S3 Table automatically performs maintenance tasks like compaction and snapshot management to optimize your tables for querying, including removing unreferenced files.&lt;/li&gt;
&lt;li&gt;S3 Table offers access management for both table and bucket&lt;/li&gt;
&lt;li&gt;Fully managed apache icebarg tables in S3&lt;/li&gt;
&lt;li&gt;It supports automatic compaction of underlying files to improve query performance and tune then further for better latency.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  S3 Table buckets namespace
&lt;/h3&gt;

&lt;p&gt;Namespace logically groups related s3 table together and thus allowing us to have greater control based on namespace of s3 tables. It helps us for following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;logical segmentation of data and multi tenancy

&lt;ul&gt;
&lt;li&gt;supporting of multi tenancy by having separate namespace. Supports compliance with data isolation requirements in regulated industries.&lt;/li&gt;
&lt;li&gt;separate tables based on application, project etc&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;prevent naming conflicts

&lt;ul&gt;
&lt;li&gt;Each namespace acts like a "container," allowing tables with the same name in different namespaces without conflicts.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Better Access Control

&lt;ul&gt;
&lt;li&gt;Policies can grant or restrict access to specific namespaces, ensuring data security and compliance.  It also reduces the risk of unauthorized access to unrelated tables in the same bucket.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Easy data management

&lt;ul&gt;
&lt;li&gt;Makes our life easier to query, update, or delete related tables in bulk.&lt;/li&gt;
&lt;li&gt;Makes easy metadata management for tables grouped under a namespace.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Advanced workflows based on namespace

&lt;ul&gt;
&lt;li&gt;It helps to simplify automation for data pipelines or real-time analytics applications.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  S3 table opertaion &amp;amp; management
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Table Operation&lt;/strong&gt;&lt;br&gt;
They are quite similar to CRUD operation.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;list tables&lt;/li&gt;
&lt;li&gt;create tables&lt;/li&gt;
&lt;li&gt;Get table metadata location&lt;/li&gt;
&lt;li&gt;Update table metadata location&lt;/li&gt;
&lt;li&gt;Delete Table&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Table Management&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Put Table Policy&lt;/li&gt;
&lt;li&gt;Put Table Bucket Policy&lt;/li&gt;
&lt;li&gt;Put Table Maintenance Config&lt;/li&gt;
&lt;li&gt;Put Table Bucket Maintenance Config&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Policies related to S3 table operation
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Allow access to create and use table buckets
&lt;/h3&gt;

&lt;p&gt;Here Action Lists the specific actions the policy allows. &lt;/p&gt;

&lt;p&gt;These actions are S3 Tables-specific: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;s3tables:CreateTableBucket: Grants permission to create a table bucket in S3 Tables. &lt;/li&gt;
&lt;li&gt;s3tables:PutTableBucketPolicy: Allows setting or updating the bucket policy for a table bucket. &lt;/li&gt;
&lt;li&gt;s3tables:GetTableBucketPolicy: Allows retrieving the bucket policy associated with a table bucket. &lt;/li&gt;
&lt;li&gt;s3tables:ListTableBuckets: Allows listing all table buckets within the specified scope. &lt;/li&gt;
&lt;li&gt;&lt;p&gt;s3tables:GetTableBucket: Grants permission to access the metadata of a specific table bucket.&lt;br&gt;
Resource Defines the scope of the resources these actions can apply to. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;"arn:aws:s3tables:region:account_id:bucket/*": Specifies all table buckets in the account (account_id) and region (region). &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The * after bucket/ indicates that permissions apply to all buckets under this account and region.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "Version": "2012-10-17",
    "Statement": [{
        "Sid": "AllowBucketActions for user",
        "Effect": "Allow",
        "Action": [
            "s3tables:CreateTableBucket",
            "s3tables:PutTableBucketPolicy",
            "s3tables:GetTableBucketPolicy",
            "s3tables:ListTableBuckets",
            "s3tables:GetTableBucket"
        ],
        "Resource": "arn:aws:s3tables:region:account_id:bucket/*"
    }]
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Allow access to create and use tables in a table bucket
&lt;/h3&gt;

&lt;p&gt;Here Action Lists the specific actions allowed by the policy, related to S3 Tables. &lt;em&gt;Please note that The first policy focused on creating and managing table buckets and associated metadata, but it did not include granular operations like managing tables within namespaces. The first policy did not include actions such as creating tables, querying data, or updating metadata at the table level. These are the operations where namespaces become relevant.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;s3tables:CreateTable: Allows creating new tables in the specified table bucket. &lt;/li&gt;
&lt;li&gt;s3tables:PutTableData: Grants permission to write data to tables within the table bucket. &lt;/li&gt;
&lt;li&gt;s3tables:GetTableData: Allows reading data from tables in the bucket.&lt;/li&gt;
&lt;li&gt;s3tables:GetTableMetadataLocation: Allows retrieving metadata location information for a table.&lt;/li&gt;
&lt;li&gt;s3tables:UpdateTableMetadataLocation: Grants permission to update the metadata location of a table. &lt;/li&gt;
&lt;li&gt;s3tables:GetNamespace: Allows retrieving namespace information associated with the table bucket. &lt;/li&gt;
&lt;li&gt;s3tables:CreateNamespace: Grants permission to create namespaces for organizing table data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Resource section specifies&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Grants permissions on the bucket named amzn-s3-demo-table-bucket&lt;/li&gt;
&lt;li&gt;Grants permissions on all tables within the amzn-s3-demo-table-bucket
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
     "Version": "2012-10-17",
     "Statement": [ 
         {
             "Sid": "AllowBucketActions",
             "Effect": "Allow",
             "Action": [
                 "s3tables:CreateTable",
                 "s3tables:PutTableData",
                 "s3tables:GetTableData",
                 "s3tables:GetTableMetadataLocation",
                 "s3tables:UpdateTableMetadataLocation",
                 "s3tables:GetNamespace",
                 "s3tables:CreateNamespace"
             ],

             "Resource": [
               "arn:aws:s3tables:region:account_id:bucket/amzn-s3-demo-table-bucket",
               "arn:aws:s3tables:region:account_id:bucket/amzn-s3-demo-table-bucket/table/*"
            ]
         }
     ]
 }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Table bucket policy to allows read access to the namespace
&lt;/h4&gt;

&lt;p&gt;This policy allows to read s3 tables from a namespace. Here Action Lists the specific actions allowed by the policy, related to S3 Tables. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;s3tables:GetTableData: Allows reading data from tables in the bucket.&lt;/li&gt;
&lt;li&gt;s3tables:GetTableMetadataLocation: Allows retrieving metadata location information for a table.
The resource section allows all s3 tables under bucket amzn-s3-demo-table-bucket1 but then s3tables:namespace restrict to only hr related s3 tables.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
     "Version": "2012-10-17",
     "Statement": [ 
         {
             "Effect": "Allow",
             "Action": [
             "Principal": {
               "AWS": "arn:aws:iam::123456789012:user/Jane"
             },
             "Action": [
                  "s3tables:GetTableData", 
                  "s3tables:GetTableMetadataLocation"
             ],
             "Resource":{ "arn:aws:s3tables:region:account_id:bucket/amzn-s3-demo-table-bucket1/table/*”}
             "Condition": { 
                  "StringLike": { "s3tables:namespace": "hr" } 
             }
     ]
 }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  S3 table automatic maintenance
&lt;/h2&gt;

&lt;p&gt;It provides automated maintenance through configurations that help simplify table management, optimize performance, and reduce operational overhead.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Table Lifecycle Management

&lt;ul&gt;
&lt;li&gt;we can add S3 Table configurations that includes lifecycle policies that automatically handle data expiration, transitions, or archival.&lt;/li&gt;
&lt;li&gt;automatic snapshot expiration can be configured easily.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Data Compaction

&lt;ul&gt;
&lt;li&gt;S3 Tables automatically compact small files (often produced by incremental writes) into larger, optimized files. It helps to have faster query and reduce storage cost.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Schema Evolution

&lt;ul&gt;
&lt;li&gt;Automated checks ensure compatibility between new and existing data.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Metadata Optimization

&lt;ul&gt;
&lt;li&gt;Indexing of metadata for faster querying and retrieval of table details.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All these can be policy based configuration.&lt;/p&gt;
&lt;h3&gt;
  
  
  Policy for snapshot management
&lt;/h3&gt;

&lt;p&gt;By configuring the maximumSnapshotAge, we can specify the retention period for table snapshots. The following example ensures S3 Table will automatically retain only the snapshots from the last 30 days&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MinimumSnapshots: Ensures that at least one snapshot is always retained, regardless of age. &lt;/li&gt;
&lt;li&gt;MaximumSnapshotAge: Specifies the maximum age (in hours) for snapshots to be retained.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws s3tables put-table-maintenance-configuration \
    --table-arn arn:aws:s3tables:region:account_id:bucket/bucket_name/table/table_name \
    --maintenance-configuration '{
        "SnapshotManagement": {
            "MinimumSnapshots": 1,
            "MaximumSnapshotAge": 720
        }
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  S3 Table Integration with AWS Analytics
&lt;/h2&gt;

&lt;p&gt;S3 Tables integrate seamlessly with AWS analytics services to enable querying, processing and insight generation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Athena - Run serverless SQL queries on S3 Tables&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use AWS Glue to create a Data Catalog for S3 Tables.&lt;/li&gt;
&lt;li&gt;Query data directly using SQL in Athena.&lt;/li&gt;
&lt;li&gt;Leverage table formats like Apache Iceberg or Parquet for optimized performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS Glue - Automate ETL processes for S3 Tables&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use Glue Crawlers to discover table metadata.&lt;/li&gt;
&lt;li&gt;Create ETL jobs to transform and load data into S3 Tables or other destinations.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  S3 Metadata table
&lt;/h2&gt;

&lt;p&gt;It includes system metadata including object tags and user defined metadata&lt;br&gt;
stored into s3 table&lt;br&gt;
generated in near real time during data creation so that it can be used in mins during query&lt;/p&gt;
&lt;h3&gt;
  
  
  Use case for S3 metadata table
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Real-Time Analytics

&lt;ul&gt;
&lt;li&gt;efficient query execution on metadata to identify relevant data partitions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Machine Learning Pipelines

&lt;ul&gt;
&lt;li&gt;metadata tables to filter, select, and partition data for model training.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Governance and Compliance

&lt;ul&gt;
&lt;li&gt;Track data retention and enforce lifecycle policies via metadata.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Multi-Tenant Data Applications

&lt;ul&gt;
&lt;li&gt;Use namespaces within metadata tables to logically isolate tenant data.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Data Cataloging and Discovery

&lt;ul&gt;
&lt;li&gt;Use metadata queries to identify datasets matching specific criteria.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is the sample python based function which uses metadata table query from athena.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def query_metadata_table(criteria):

    query = f"""
        SELECT *
        FROM {DATABASE}.{TABLE}
        WHERE {criteria}
    """

    print(f"Running query: {query}")

    # Start Athena query
    response = athena_client.start_query_execution(
        QueryString=query,
        QueryExecutionContext={'Database': DATABASE},
        ResultConfiguration={'OutputLocation': S3_OUTPUT}
    )

    query_execution_id = response['QueryExecutionId']

    # Wait for query completion
    print("Waiting for query to complete...")
    while True:
        status = athena_client.get_query_execution(QueryExecutionId=query_execution_id)
        state = status['QueryExecution']['Status']['State']
        if state in ['SUCCEEDED', 'FAILED', 'CANCELLED']:
            break
        time.sleep(2)

    if state != 'SUCCEEDED':
        raise Exception(f"Query failed with state: {state}")

    # Retrieve results
    results = athena_client.get_query_results(QueryExecutionId=query_execution_id)
    datasets = []
    for row in results['ResultSet']['Rows'][1:]:  # Skip the header row
        datasets.append([col['VarCharValue'] for col in row['Data']])

    print(f"Query returned {len(datasets)} datasets matching the criteria.")
    return datasets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>aws</category>
      <category>s3</category>
      <category>analytics</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Brief Notes on AWS CodeDeploy</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Thu, 21 Mar 2024 19:04:04 +0000</pubDate>
      <link>https://dev.to/aws-builders/brief-notes-on-aws-codedeploy-2731</link>
      <guid>https://dev.to/aws-builders/brief-notes-on-aws-codedeploy-2731</guid>
      <description>&lt;p&gt;Service that automates code deployments to any instance, including Amazon EC2 instances and instances running on-premises.&lt;/p&gt;

&lt;h2&gt;
  
  
  Supported Platforms/Deployment Types:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;EC2/On-Premises: In-Place or Blue/Green Deployments&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Describes instances of physical servers that can be Amazon EC2 cloud instances, on-premises servers, or both. Applications created using the EC2/On-Premises compute platform can be composed of executable files, configuration files, images, and more. o   -   - Deployments that use the EC2/On-Premises compute platform manage the way in which traffic is directed to instances by using an in-place or blue/green deployment type.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;AWS Lambda: Canary, Linear, All-At-Once Deployments&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Applications created using the AWS Lambda compute platform can manage the way in which traffic is directed to the updated Lambda function versions during a deployment by choosing a canary, linear, or all-at-once configuration.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Amazon ECS: Blue/Green Deployment&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Used to deploy an Amazon ECS containerized application as a task set. &lt;/li&gt;
&lt;li&gt;CodeDeploy performs a blue/green deployment by installing an updated version of the containerized application as a new replacement task set. CodeDeploy reroutes production traffic from the original application, or task set, to the replacement task set. The original task set is terminated after a successful deployment.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deployment approach for EC2
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Deploys a revision to a set of instances.&lt;/li&gt;
&lt;li&gt;Deploys a new revision that consists of an application and AppSpec file. The AppSpec specifies how to deploy the application to the instances in a deployment group.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezsau98rpjq1qlkvy69j.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezsau98rpjq1qlkvy69j.jpg" alt="URL" width="635" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment approach for Lambda
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Deploys a new version of a serverless Lambda function on a high-availability compute infrastructure.&lt;/li&gt;
&lt;li&gt;Shifts production traffic from one version of a Lambda function to a new version of the same function. The AppSpec file specifies which Lambda function version to deploy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gabfs5volddde900t0u.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gabfs5volddde900t0u.jpg" alt="url" width="660" height="290"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment approach for ECS
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Deploys an updated version of an Amazon ECS containerized application as a new, replacement task set. CodeDeploy reroutes production traffic from the task set with the original version to the new replacement task set with the updated version. When the deployment completes, the original task set is terminated.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fobk0qy9jw9jw03ddevli.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fobk0qy9jw9jw03ddevli.jpg" alt="URL" width="660" height="294"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  App Spec File
&lt;/h2&gt;

&lt;p&gt;The application specification file (AppSpec file) is a YAML-formatted or JSON-formatted file used by CodeDeploy to manage a deployment. Note: the name of the AppSpec file for an EC2/On-Premises deployment must be appspec.yml. The name of the AppSpec file for an Amazon ECS or AWS Lambda deployment must be appspec.yml.&lt;/p&gt;

&lt;p&gt;For ECS&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The container and port in replacement task set where your Application Load Balancer or Network Load Balancer reroutes traffic during a deployment. This is specified with the LoadBalancerInfo instruction in the AppSpec file.&lt;/li&gt;
&lt;li&gt;Amazon ECS task definition file. This is specified with its ARN in the TaskDefinition instruction in the AppSpec file.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For Lambda&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda function version to deploy.&lt;/li&gt;
&lt;li&gt;Lambda functions to use as validation tests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For EC2&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which lifecycle event hooks to run in response to deployment lifecycle events.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>cicd</category>
    </item>
    <item>
      <title>Bedrock Agent &amp; Tools - Tracing Best practises</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Wed, 20 Mar 2024 17:58:52 +0000</pubDate>
      <link>https://dev.to/aws-builders/bedrock-agent-tools-tracing-best-practises-4217</link>
      <guid>https://dev.to/aws-builders/bedrock-agent-tools-tracing-best-practises-4217</guid>
      <description>&lt;p&gt;I understand most of bedrock agent userss will have a use case where you have implemented multiple Lambda functions with a Bedrock Agent to perform different tasks and are looking for guidance in Debugging the API calls and responses from the Agent and lambda functions.&lt;/p&gt;

&lt;p&gt;Here are some of the approaches that we have been using and found quite effective to track and trace agents and usage of their tools&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enable Tracing for the Agent: When invoking the agent, set the &lt;code&gt;debug&lt;/code&gt; parameter to &lt;code&gt;true&lt;/code&gt;. This will enable detailed tracing for the agent's execution, including the tools (Lambda functions) invoked and their responses. The trace will be printed to the console or returned as part of the agent's response, depending on how you invoke the agent. [1] Example (Python): &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;python result = agent.run(query, debug=True)&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Log Within Lambda Functions: Within each of your Lambda functions (tools), add logging statements to capture relevant information and events. You can use AWS Lambda's built-in logging capabilities or integrate with a centralized logging service like Amazon CloudWatch Logs. [2] Example (Python): &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;python import logging &lt;br&gt;
logger = logging.getLogger(__name__) &lt;br&gt;
def lambda_handler(event, context): &lt;br&gt;
   http://logger.info (f"Received event: {event}") # Your Lambda function's logic here http://&lt;br&gt;
   logger.info (f"Returning result: {result}") return result&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Correlate Logs Using Request IDs or Tracing IDs: To correlate logs across multiple Lambda functions and the agent, you can use request IDs or tracing IDs. Pass a unique ID as part of the event or context to your Lambda functions and include it in your log statements. This will allow you to trace the flow of events across different components of your system.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;&lt;br&gt;
import logging&lt;br&gt;
   import uuid&lt;br&gt;
   def lambda_handler(event, context):&lt;br&gt;
       request_id = event.get("request_id", str(uuid.uuid4()))&lt;br&gt;
       logger = logging.getLogger(__name__)&lt;br&gt;
       logger = logging.LoggerAdapter(logger, {"request_id": request_id})&lt;br&gt;
       logger.info(f"Received event: {event}")&lt;br&gt;
       logger.info(f"Returning result: {result}")&lt;br&gt;
       return result&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Use AWS X-Ray for Distributed Tracing: AWS X-Ray is a service that can help you analyze and debug distributed applications, including Lambda functions. By integrating X-Ray with your Bedrock application, you can trace requests as they travel through your Lambda functions and gain insights into their performance and potential issues. [3] - Enable X-Ray tracing for your Lambda functions by adding the necessary configuration. - Instrument your Lambda functions with X-Ray tracing code to capture relevant information and events. - Use the X-Ray console or integrate with other monitoring tools to analyze the traces and identify potential bottlenecks or issues.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement Advanced prompts : By using advanced prompts, you can enhance your agent's accuracy through modifying these prompt templates to provide detailed configurations. You can also provide hand-curated examples for few-shot prompting, in which you improve model performance by providing labeled examples for a specific task. [4] By combining the built-in tracing mechanism, custom logging within your Lambda functions, and distributed tracing with AWS X-Ray, you can gain better visibility into the API calls, events, and interactions happening within your Bedrock agent and its associated tools. This can help you debug issues more effectively and trace errors back to their source across multiple Lambda functions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Reference&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/trace-events.html" rel="noopener noreferrer"&gt;Trace events in Amazon Bedrock - Amazon Bedrock&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/operatorguide/best-practices-debugging.html" rel="noopener noreferrer"&gt;Best practices for your debugging environment - AWS Lambda&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/xray/latest/devguide/aws-xray.html" rel="noopener noreferrer"&gt;What is AWS X-Ray? - AWS X-Ray &lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/advanced-prompts.html" rel="noopener noreferrer"&gt;Advanced prompts in Amazon Bedrock - Amazon Bedrock &lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

</description>
      <category>sagemaker</category>
      <category>aws</category>
      <category>bedrock</category>
    </item>
  </channel>
</rss>
