<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Amit Kayal</title>
    <description>The latest articles on DEV Community by Amit Kayal (@amitkayal).</description>
    <link>https://dev.to/amitkayal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F500645%2Fe0c703c3-855c-4fbd-a1c0-b546a60c022e.png</url>
      <title>DEV Community: Amit Kayal</title>
      <link>https://dev.to/amitkayal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/amitkayal"/>
    <language>en</language>
    <item>
      <title>Google Open Knowledge Format: Why Enterprise Agents Need a Knowledge Layer, Not Just More Tools</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Thu, 18 Jun 2026 06:41:06 +0000</pubDate>
      <link>https://dev.to/aws-builders/google-open-knowledge-format-why-enterprise-agents-need-a-knowledge-layer-not-just-more-tools-je1</link>
      <guid>https://dev.to/aws-builders/google-open-knowledge-format-why-enterprise-agents-need-a-knowledge-layer-not-just-more-tools-je1</guid>
      <description>&lt;h1&gt;
  
  
  Google Open Knowledge Format: Why Enterprise Agents Need a Knowledge Layer, Not Just More Tools
&lt;/h1&gt;

&lt;p&gt;Most enterprise AI conversations still start in the wrong place.&lt;/p&gt;

&lt;p&gt;They start with the model.&lt;/p&gt;

&lt;p&gt;Which model should we use? Which framework should we adopt? Which vendor has the best agent platform? Which tools should we connect next?&lt;/p&gt;

&lt;p&gt;These are fair questions. But in real enterprise architecture, they are not the hardest questions.&lt;/p&gt;

&lt;p&gt;The harder question is this:&lt;/p&gt;

&lt;p&gt;Can our AI systems actually understand how our business works?&lt;/p&gt;

&lt;p&gt;That is why Google Cloud’s article on Open Knowledge Format caught my attention. The article talks about a simple but important idea: representing knowledge in a way that humans can read and machines can use. In OKF, that means markdown for the content and structured metadata for context.&lt;/p&gt;

&lt;p&gt;At first glance, that may sound too simple.&lt;/p&gt;

&lt;p&gt;But that simplicity is the point.&lt;/p&gt;

&lt;p&gt;Enterprises do not need another place where knowledge goes to die. We already have enough portals, catalogs, wikis, dashboards, folders, and internal tools. What we need is a practical way to package knowledge so it can be reviewed, versioned, governed, searched, and reused by both people and AI agents.&lt;/p&gt;

&lt;p&gt;That is where this idea becomes very relevant for agentic AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Enterprise AI Problem
&lt;/h2&gt;

&lt;p&gt;Most organizations already have the knowledge their AI agents need.&lt;/p&gt;

&lt;p&gt;They have it in databases, dashboards, tickets, architecture notes, runbooks, Confluence pages, data catalogs, code comments, incident reports, old project documents, and the heads of experienced employees.&lt;/p&gt;

&lt;p&gt;The issue is not that knowledge does not exist.&lt;/p&gt;

&lt;p&gt;The issue is that it is fragmented.&lt;/p&gt;

&lt;p&gt;Some of it is outdated. Some of it is duplicated. Some of it is tribal. Some of it is locked inside tools. Some of it is written for humans but not structured enough for AI systems to use reliably.&lt;/p&gt;

&lt;p&gt;This becomes a serious problem when we move from AI assistants to AI agents.&lt;/p&gt;

&lt;p&gt;An assistant can give a helpful answer. An agent does more. It plans, selects tools, queries systems, executes steps, generates outputs, and sometimes triggers workflows.&lt;/p&gt;

&lt;p&gt;That means the cost of wrong context is much higher.&lt;/p&gt;

&lt;p&gt;A data agent may know how to generate SQL. But does it know which table is the source of truth?&lt;/p&gt;

&lt;p&gt;A finance agent may calculate revenue. But does it know whether the business means booked revenue, invoiced revenue, recognized revenue, or collected cash?&lt;/p&gt;

&lt;p&gt;A support agent may summarize a customer case. But does it know what customer information must be masked before anything is shared externally?&lt;/p&gt;

&lt;p&gt;A delivery agent may review project status. But does it understand governance rules, escalation paths, release gates, and dependency risks?&lt;/p&gt;

&lt;p&gt;A cloud cost agent may recommend savings. But does it know which environments are production-critical and which ones are safe to shut down?&lt;/p&gt;

&lt;p&gt;Without this context, agents do not become enterprise-ready. They become fast, confident, and risky.&lt;/p&gt;

&lt;h2&gt;
  
  
  More Tools Will Not Solve This
&lt;/h2&gt;

&lt;p&gt;One common mistake in agentic AI is assuming that more tool access means better capability.&lt;/p&gt;

&lt;p&gt;Connect the database.&lt;br&gt;
Connect the CRM.&lt;br&gt;
Connect the ticketing system.&lt;br&gt;
Connect the cloud APIs.&lt;br&gt;
Connect the document repository.&lt;br&gt;
Connect the workflow engine.&lt;/p&gt;

&lt;p&gt;This improves reach, but not necessarily judgment.&lt;/p&gt;

&lt;p&gt;An agent with many tools and weak context can still choose the wrong source, apply the wrong rule, query the wrong table, expose the wrong field, or automate the wrong step.&lt;/p&gt;

&lt;p&gt;That is why I believe every serious enterprise agentic AI framework needs three layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A reasoning layer&lt;/li&gt;
&lt;li&gt;A tool/action layer&lt;/li&gt;
&lt;li&gt;A governed knowledge layer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most teams are investing heavily in the first two. They are testing models, orchestration frameworks, prompts, tools, APIs, and agent workflows.&lt;/p&gt;

&lt;p&gt;That work is needed.&lt;/p&gt;

&lt;p&gt;But the third layer is where enterprise differentiation will come from.&lt;/p&gt;

&lt;p&gt;The model can be changed.&lt;/p&gt;

&lt;p&gt;The tools can be integrated.&lt;/p&gt;

&lt;p&gt;But the organization’s internal knowledge — its definitions, operating rules, business logic, exceptions, ownership, architecture, and lessons learned — is unique.&lt;/p&gt;

&lt;p&gt;That is the real asset.&lt;/p&gt;
&lt;h2&gt;
  
  
  What Google Open Knowledge Format Gets Right
&lt;/h2&gt;

&lt;p&gt;What I like about the Open Knowledge Format idea is that it does not overcomplicate the problem.&lt;/p&gt;

&lt;p&gt;It treats knowledge as something that should be readable, portable, structured, and maintainable.&lt;/p&gt;

&lt;p&gt;Markdown makes it easy for humans to read and contribute. Structured metadata makes it easier for systems and agents to classify, retrieve, and use the knowledge. Version control makes review and audit possible.&lt;/p&gt;

&lt;p&gt;This matters because traditional documentation is passive.&lt;/p&gt;

&lt;p&gt;Someone writes it. Someone may read it. Eventually, it becomes stale.&lt;/p&gt;

&lt;p&gt;Agentic AI needs active knowledge.&lt;/p&gt;

&lt;p&gt;The knowledge has to be available at runtime. It should help the agent decide what to do, what not to do, which source to trust, which rule to apply, and when to escalate.&lt;/p&gt;

&lt;p&gt;A database schema may say that a table has a column called &lt;code&gt;segment&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That is useful, but not enough.&lt;/p&gt;

&lt;p&gt;The agent also needs to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What does &lt;code&gt;segment&lt;/code&gt; mean?&lt;/li&gt;
&lt;li&gt;Who owns the definition?&lt;/li&gt;
&lt;li&gt;Which values are valid?&lt;/li&gt;
&lt;li&gt;Is the field reliable?&lt;/li&gt;
&lt;li&gt;Can it be used for reporting?&lt;/li&gt;
&lt;li&gt;Can it be exposed externally?&lt;/li&gt;
&lt;li&gt;Are there legacy exceptions?&lt;/li&gt;
&lt;li&gt;Which workflows are allowed to use it?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the gap between data access and enterprise intelligence.&lt;/p&gt;
&lt;h2&gt;
  
  
  How This Fits into an Agentic AI Framework
&lt;/h2&gt;

&lt;p&gt;In our agentic AI framework, I would treat an OKF-like structure as the Enterprise Knowledge Layer.&lt;/p&gt;

&lt;p&gt;This layer should sit between enterprise systems and agent execution.&lt;/p&gt;

&lt;p&gt;The agent should not jump directly from a user request to a tool call. That is where many mistakes happen.&lt;/p&gt;

&lt;p&gt;A better flow is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User request
   ↓
Agent identifies intent and domain
   ↓
Agent retrieves relevant knowledge
   ↓
Agent checks source of truth, ownership, caveats, access rules, and usage guidance
   ↓
Agent plans the action
   ↓
Agent calls the right tool
   ↓
Agent produces the answer or executes the workflow
   ↓
Agent proposes a knowledge update if reusable learning is found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This changes the quality of execution.&lt;/p&gt;

&lt;p&gt;Take a simple question:&lt;/p&gt;

&lt;p&gt;“Show me revenue by customer segment.”&lt;/p&gt;

&lt;p&gt;A weak agent will search for tables with names like &lt;code&gt;revenue&lt;/code&gt;, &lt;code&gt;customer&lt;/code&gt;, and &lt;code&gt;segment&lt;/code&gt;, then generate SQL.&lt;/p&gt;

&lt;p&gt;A stronger enterprise agent will first check the knowledge layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which revenue table is approved?&lt;/li&gt;
&lt;li&gt;Which revenue definition applies?&lt;/li&gt;
&lt;li&gt;Which customer segment field is trusted?&lt;/li&gt;
&lt;li&gt;Which join is valid?&lt;/li&gt;
&lt;li&gt;Which date logic should be used?&lt;/li&gt;
&lt;li&gt;Are there caveats for legacy accounts?&lt;/li&gt;
&lt;li&gt;Is the user allowed to see this output?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Only after that should it query the database.&lt;/p&gt;

&lt;p&gt;That is the difference between automation and governed intelligence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generating Open Knowledge Format from AWS SQL Databases
&lt;/h2&gt;

&lt;p&gt;For AWS SQL environments such as Amazon RDS, Aurora, and Redshift, the starting point is metadata extraction.&lt;/p&gt;

&lt;p&gt;We can automatically extract:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Database names&lt;/li&gt;
&lt;li&gt;Schema names&lt;/li&gt;
&lt;li&gt;Table names&lt;/li&gt;
&lt;li&gt;Column names&lt;/li&gt;
&lt;li&gt;Data types&lt;/li&gt;
&lt;li&gt;Primary keys&lt;/li&gt;
&lt;li&gt;Foreign keys&lt;/li&gt;
&lt;li&gt;Indexes&lt;/li&gt;
&lt;li&gt;Nullability&lt;/li&gt;
&lt;li&gt;Row counts&lt;/li&gt;
&lt;li&gt;Table comments&lt;/li&gt;
&lt;li&gt;Column comments&lt;/li&gt;
&lt;li&gt;AWS tags&lt;/li&gt;
&lt;li&gt;Freshness indicators&lt;/li&gt;
&lt;li&gt;Query usage patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS Glue Crawlers and the Glue Data Catalog can help discover and centralize metadata. Database-native sources like &lt;code&gt;information_schema&lt;/code&gt; can provide table and column-level structure.&lt;/p&gt;

&lt;p&gt;But metadata is not knowledge.&lt;/p&gt;

&lt;p&gt;A pipeline can discover that a table has a column called &lt;code&gt;revenue_amount&lt;/code&gt;. It cannot automatically know whether that means booked revenue, recognized revenue, invoiced revenue, or pipeline value. That meaning has to come from finance, sales operations, data owners, or approved documentation.&lt;/p&gt;

&lt;p&gt;So OKF generation should be semi-automated.&lt;/p&gt;

&lt;p&gt;Technical metadata should be generated automatically. Business meaning should be reviewed and approved by the right domain owners.&lt;/p&gt;

&lt;p&gt;For every critical SQL table, the knowledge file should capture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Business purpose&lt;/li&gt;
&lt;li&gt;Source system&lt;/li&gt;
&lt;li&gt;Source-of-truth status&lt;/li&gt;
&lt;li&gt;Data owner&lt;/li&gt;
&lt;li&gt;Data steward&lt;/li&gt;
&lt;li&gt;Refresh frequency&lt;/li&gt;
&lt;li&gt;Key columns&lt;/li&gt;
&lt;li&gt;Approved joins&lt;/li&gt;
&lt;li&gt;Common usage patterns&lt;/li&gt;
&lt;li&gt;Prohibited usage&lt;/li&gt;
&lt;li&gt;Data quality rules&lt;/li&gt;
&lt;li&gt;Sensitivity classification&lt;/li&gt;
&lt;li&gt;Agent usage guidance&lt;/li&gt;
&lt;li&gt;Known caveats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A table-level knowledge file should not simply describe the table. It should tell an agent how to use that table safely.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sales.crm.customer_account&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;table&lt;/span&gt;
&lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aurora-postgresql&lt;/span&gt;
&lt;span class="na"&gt;cloud&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws&lt;/span&gt;
&lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;crm_prod&lt;/span&gt;
&lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sales&lt;/span&gt;
&lt;span class="na"&gt;table&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;customer_account&lt;/span&gt;
&lt;span class="na"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sales&lt;/span&gt;
&lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;revenue-operations&lt;/span&gt;
&lt;span class="na"&gt;steward&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;data-platform-team&lt;/span&gt;
&lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;span class="na"&gt;classification&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;confidential&lt;/span&gt;
&lt;span class="na"&gt;pii&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;freshness_sla&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;15&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;minutes"&lt;/span&gt;
&lt;span class="na"&gt;source_system&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;salesforce&lt;/span&gt;
&lt;span class="na"&gt;agent_usage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allowed_with_row_level_controls&lt;/span&gt;
&lt;span class="na"&gt;approval_status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;approved&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gh"&gt;# customer_account&lt;/span&gt;

&lt;span class="gu"&gt;## Business Meaning&lt;/span&gt;

This table represents customer account records synchronized from Salesforce into the CRM production database.

It is the approved source for account ownership, account segment, customer lifecycle stage, and sales territory mapping.

&lt;span class="gu"&gt;## Agent Usage Guidance&lt;/span&gt;

Use this table for customer account analysis, sales ownership, account segmentation, and lifecycle stage reporting.

Do not use this table for audited revenue reporting, invoice reconciliation, or financial close reporting.

For revenue reporting, use the approved finance revenue table.

&lt;span class="gu"&gt;## Important Caveats&lt;/span&gt;

Some legacy accounts may have missing segment values. Agents must not infer missing segment values without confirmation.

This table contains confidential customer information. Agents must apply row-level access controls and masking rules where applicable.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is much more useful than a raw schema.&lt;/p&gt;

&lt;p&gt;The schema tells the agent what exists.&lt;/p&gt;

&lt;p&gt;The knowledge file tells the agent how to use it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generating Open Knowledge Format from AWS NoSQL Databases
&lt;/h2&gt;

&lt;p&gt;NoSQL systems need even more care.&lt;/p&gt;

&lt;p&gt;In DynamoDB, the table name and attributes rarely tell the full story. The real design is usually in the access patterns.&lt;/p&gt;

&lt;p&gt;A DynamoDB table may store multiple entity types. It may use composite keys. It may depend on global secondary indexes. It may be optimized for specific queries and unsuitable for others.&lt;/p&gt;

&lt;p&gt;If an agent does not understand this, it can misuse the table, trigger inefficient scans, produce incomplete answers, or misunderstand the business process.&lt;/p&gt;

&lt;p&gt;For DynamoDB, the knowledge file should capture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Table purpose&lt;/li&gt;
&lt;li&gt;Partition key&lt;/li&gt;
&lt;li&gt;Sort key&lt;/li&gt;
&lt;li&gt;Item types&lt;/li&gt;
&lt;li&gt;Common item shapes&lt;/li&gt;
&lt;li&gt;Global secondary indexes&lt;/li&gt;
&lt;li&gt;Local secondary indexes&lt;/li&gt;
&lt;li&gt;Streams&lt;/li&gt;
&lt;li&gt;TTL rules&lt;/li&gt;
&lt;li&gt;Primary access patterns&lt;/li&gt;
&lt;li&gt;Anti-patterns&lt;/li&gt;
&lt;li&gt;Sensitive attributes&lt;/li&gt;
&lt;li&gt;Agent permissions&lt;/li&gt;
&lt;li&gt;Operational caveats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most important part is access pattern documentation.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Access pattern 1:
Get all events for an order
PK = order_id
SK = event_timestamp

Access pattern 2:
Get latest order status
PK = order_id
Sort descending by event_timestamp
Limit = 1

Access pattern 3:
Investigate failed payment events
Use event_type index if available
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells the agent how the table is actually meant to be used.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;commerce.dynamodb.order_events&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nosql_table&lt;/span&gt;
&lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dynamodb&lt;/span&gt;
&lt;span class="na"&gt;cloud&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws&lt;/span&gt;
&lt;span class="na"&gt;table&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;order_events&lt;/span&gt;
&lt;span class="na"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;commerce&lt;/span&gt;
&lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;order-platform-team&lt;/span&gt;
&lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;span class="na"&gt;partition_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;order_id&lt;/span&gt;
&lt;span class="na"&gt;sort_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;event_timestamp&lt;/span&gt;
&lt;span class="na"&gt;billing_mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PAY_PER_REQUEST&lt;/span&gt;
&lt;span class="na"&gt;stream_enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;classification&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;confidential&lt;/span&gt;
&lt;span class="na"&gt;pii&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="na"&gt;agent_usage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allowed_read_only&lt;/span&gt;
&lt;span class="na"&gt;approval_status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;approved&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gh"&gt;# order_events&lt;/span&gt;

&lt;span class="gu"&gt;## Business Meaning&lt;/span&gt;

This table stores the event history of customer orders.

Each item represents an event in the order lifecycle, such as order created, payment completed, shipment initiated, shipment delivered, cancellation requested, or refund completed.

&lt;span class="gu"&gt;## Primary Access Pattern&lt;/span&gt;

Retrieve the event timeline for a specific order.

PK = order_id  
SK = event_timestamp

&lt;span class="gu"&gt;## Agent Usage Guidance&lt;/span&gt;

Agents should use this table to reconstruct order history, check operational status, and investigate order workflow issues.

Agents should not use this table as the financial source of truth for revenue, refunds, or payment settlement.

&lt;span class="gu"&gt;## Caveats&lt;/span&gt;

This table is append-only.

The latest operational event should not be treated as financial completion. For accounting status, agents must check the finance ledger.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents an agent from treating NoSQL like a normal relational model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generating Knowledge from Document Databases
&lt;/h2&gt;

&lt;p&gt;For document databases such as Amazon DocumentDB or MongoDB-compatible systems, the main challenge is flexible structure and nested sensitive data.&lt;/p&gt;

&lt;p&gt;A support case document may contain customer messages. A customer profile may contain personal data. A workflow document may include internal comments, commercial terms, or escalation notes.&lt;/p&gt;

&lt;p&gt;Agents need clear rules before reading, summarizing, or exposing this type of content.&lt;/p&gt;

&lt;p&gt;For document collections, the knowledge file should capture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Collection purpose&lt;/li&gt;
&lt;li&gt;Common document structure&lt;/li&gt;
&lt;li&gt;Required fields&lt;/li&gt;
&lt;li&gt;Optional fields&lt;/li&gt;
&lt;li&gt;Nested arrays&lt;/li&gt;
&lt;li&gt;Indexes&lt;/li&gt;
&lt;li&gt;Query patterns&lt;/li&gt;
&lt;li&gt;Sensitive fields&lt;/li&gt;
&lt;li&gt;Masking rules&lt;/li&gt;
&lt;li&gt;Retention rules&lt;/li&gt;
&lt;li&gt;Allowed agent use cases&lt;/li&gt;
&lt;li&gt;Prohibited use cases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;support.documentdb.customer_cases&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;document_collection&lt;/span&gt;
&lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;documentdb&lt;/span&gt;
&lt;span class="na"&gt;cloud&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws&lt;/span&gt;
&lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;support_prod&lt;/span&gt;
&lt;span class="na"&gt;collection&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;customer_cases&lt;/span&gt;
&lt;span class="na"&gt;domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;customer-support&lt;/span&gt;
&lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;support-platform-team&lt;/span&gt;
&lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
&lt;span class="na"&gt;classification&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;restricted&lt;/span&gt;
&lt;span class="na"&gt;pii&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;agent_usage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allowed_with_masking&lt;/span&gt;
&lt;span class="na"&gt;approval_status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;approved&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gh"&gt;# customer_cases&lt;/span&gt;

&lt;span class="gu"&gt;## Business Meaning&lt;/span&gt;

This collection stores customer support cases raised through web, email, account manager, and internal escalation channels.

It is used to view case history, identify recurring issues, and prepare escalation summaries.

&lt;span class="gu"&gt;## Agent Usage Guidance&lt;/span&gt;

Agents can use this collection for internal case summaries, issue classification, support briefings, and next-action recommendations.

Agents must not expose raw customer messages externally without masking sensitive information.

&lt;span class="gu"&gt;## Sensitive Fields&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; customer_email
&lt;span class="p"&gt;-&lt;/span&gt; phone_number
&lt;span class="p"&gt;-&lt;/span&gt; messages.message
&lt;span class="p"&gt;-&lt;/span&gt; account_id
&lt;span class="p"&gt;-&lt;/span&gt; internal_notes

&lt;span class="gu"&gt;## Caveats&lt;/span&gt;

Free-text messages may contain sensitive personal or commercial information. Agents should summarize the issue and intent rather than copying raw text.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not just documentation. It is a guardrail.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AWS Pipeline I Would Build
&lt;/h2&gt;

&lt;p&gt;I would not make this a manual documentation exercise.&lt;/p&gt;

&lt;p&gt;That will not scale.&lt;/p&gt;

&lt;p&gt;I would build a pipeline that generates draft knowledge files automatically, then routes critical content for human review.&lt;/p&gt;

&lt;p&gt;A practical AWS-based pipeline would look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AWS data sources
   ↓
Metadata extraction
   ↓
Profiling and classification
   ↓
Business enrichment
   ↓
OKF draft generation
   ↓
Human review and approval
   ↓
Git-based version control
   ↓
Indexing into agent knowledge retrieval
   ↓
Agent execution
   ↓
Feedback loop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The extraction layer would pull from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS Glue Data Catalog&lt;/li&gt;
&lt;li&gt;RDS and Aurora metadata&lt;/li&gt;
&lt;li&gt;Redshift catalog tables&lt;/li&gt;
&lt;li&gt;DynamoDB DescribeTable&lt;/li&gt;
&lt;li&gt;DynamoDB exports to S3&lt;/li&gt;
&lt;li&gt;DocumentDB collection profiling&lt;/li&gt;
&lt;li&gt;AWS tags&lt;/li&gt;
&lt;li&gt;CloudWatch metrics&lt;/li&gt;
&lt;li&gt;IAM and Lake Formation policies&lt;/li&gt;
&lt;li&gt;Existing documentation and runbooks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The enrichment layer would add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Business definitions&lt;/li&gt;
&lt;li&gt;Source-of-truth mapping&lt;/li&gt;
&lt;li&gt;Ownership&lt;/li&gt;
&lt;li&gt;Usage guidance&lt;/li&gt;
&lt;li&gt;Sensitivity classification&lt;/li&gt;
&lt;li&gt;Approved joins&lt;/li&gt;
&lt;li&gt;Known caveats&lt;/li&gt;
&lt;li&gt;Agent-specific instructions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The governance layer would make sure the knowledge is trusted before agents rely on it.&lt;/p&gt;

&lt;p&gt;This is important.&lt;/p&gt;

&lt;p&gt;If we automate everything without review, we risk creating wrong knowledge at scale.&lt;/p&gt;

&lt;p&gt;If we manually write everything, we will never scale.&lt;/p&gt;

&lt;p&gt;The practical answer is auto-generation with human governance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Governance Is Not Optional
&lt;/h2&gt;

&lt;p&gt;Enterprise agents need trust boundaries.&lt;/p&gt;

&lt;p&gt;Every knowledge file should have ownership and lifecycle metadata.&lt;/p&gt;

&lt;p&gt;At minimum:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;finance-operations&lt;/span&gt;
&lt;span class="na"&gt;steward&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;data-platform-team&lt;/span&gt;
&lt;span class="na"&gt;classification&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;confidential&lt;/span&gt;
&lt;span class="na"&gt;approval_status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;approved&lt;/span&gt;
&lt;span class="na"&gt;agent_usage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;allowed_internal_only&lt;/span&gt;
&lt;span class="na"&gt;last_reviewed_at&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-06-18"&lt;/span&gt;
&lt;span class="na"&gt;next_review_due&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-09-18"&lt;/span&gt;
&lt;span class="na"&gt;confidence&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;high&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives the framework accountability.&lt;/p&gt;

&lt;p&gt;If an agent uses a revenue definition, we should know who approved it.&lt;/p&gt;

&lt;p&gt;If an agent queries a customer table, we should know whether it contains PII.&lt;/p&gt;

&lt;p&gt;If an agent summarizes a support case, we should know what masking rules apply.&lt;/p&gt;

&lt;p&gt;Governance is not bureaucracy here.&lt;/p&gt;

&lt;p&gt;Governance is what allows agentic AI to move from demo to production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Feedback Loop
&lt;/h2&gt;

&lt;p&gt;The best part of this approach is that the knowledge layer can improve over time.&lt;/p&gt;

&lt;p&gt;Agents should not only consume knowledge. They should help identify gaps in it.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A data agent discovers that two tables have conflicting definitions.&lt;/li&gt;
&lt;li&gt;A support agent identifies a recurring customer exception.&lt;/li&gt;
&lt;li&gt;A delivery agent finds that a release checklist is outdated.&lt;/li&gt;
&lt;li&gt;A FinOps agent identifies an untagged resource pattern.&lt;/li&gt;
&lt;li&gt;A sales agent finds that a metric definition is ambiguous.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But agents should not directly overwrite approved knowledge.&lt;/p&gt;

&lt;p&gt;They should propose updates.&lt;/p&gt;

&lt;p&gt;The workflow should be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent identifies reusable learning
   ↓
Agent creates OKF update proposal
   ↓
Domain owner reviews
   ↓
Approved change is merged
   ↓
Knowledge index is refreshed
   ↓
Future agents use improved context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This turns agentic AI into a learning operating model, not just task automation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where I Would Start
&lt;/h2&gt;

&lt;p&gt;I would not start by documenting the whole enterprise.&lt;/p&gt;

&lt;p&gt;That sounds ambitious, but it is usually a bad execution plan.&lt;/p&gt;

&lt;p&gt;It becomes a documentation program, not an AI acceleration program.&lt;/p&gt;

&lt;p&gt;I would start with one high-value domain where correctness matters.&lt;/p&gt;

&lt;p&gt;Good candidates are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Revenue analytics&lt;/li&gt;
&lt;li&gt;Customer support&lt;/li&gt;
&lt;li&gt;Delivery governance&lt;/li&gt;
&lt;li&gt;Cloud cost optimization&lt;/li&gt;
&lt;li&gt;Sales operations&lt;/li&gt;
&lt;li&gt;Data quality monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the first MVP, I would select 10 to 20 high-value datasets and generate knowledge files around them.&lt;/p&gt;

&lt;p&gt;The MVP should include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;AWS metadata extraction&lt;/li&gt;
&lt;li&gt;Draft OKF generation&lt;/li&gt;
&lt;li&gt;Manual business enrichment&lt;/li&gt;
&lt;li&gt;Data owner approval&lt;/li&gt;
&lt;li&gt;Git-based versioning&lt;/li&gt;
&lt;li&gt;Agent retrieval integration&lt;/li&gt;
&lt;li&gt;Measurement of answer quality and tool accuracy&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The goal is not to create perfect documentation.&lt;/p&gt;

&lt;p&gt;The goal is to make agents more accurate, more governed, and more useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Would Measure Success
&lt;/h2&gt;

&lt;p&gt;The wrong metric is “number of OKF files created.”&lt;/p&gt;

&lt;p&gt;That only measures documentation volume.&lt;/p&gt;

&lt;p&gt;The right metrics are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduction in incorrect SQL generation&lt;/li&gt;
&lt;li&gt;Reduction in wrong source-of-truth usage&lt;/li&gt;
&lt;li&gt;Increase in agent answer accuracy&lt;/li&gt;
&lt;li&gt;Reduction in human corrections&lt;/li&gt;
&lt;li&gt;Increase in approved knowledge reuse&lt;/li&gt;
&lt;li&gt;Reduction in deprecated table usage&lt;/li&gt;
&lt;li&gt;Improvement in data discovery time&lt;/li&gt;
&lt;li&gt;Fewer governance violations&lt;/li&gt;
&lt;li&gt;Higher user trust in agent outputs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The question should always be:&lt;/p&gt;

&lt;p&gt;Did the knowledge layer make the agent better?&lt;/p&gt;

&lt;p&gt;If not, we are just creating another documentation repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final View
&lt;/h2&gt;

&lt;p&gt;Google Open Knowledge Format is not interesting because markdown and YAML are new.&lt;/p&gt;

&lt;p&gt;They are not.&lt;/p&gt;

&lt;p&gt;It is interesting because it points to one of the most important problems in enterprise AI: how to make organizational knowledge usable by agents without locking it inside one platform.&lt;/p&gt;

&lt;p&gt;In an AWS environment, we already have many of the raw signals: RDS schemas, Aurora metadata, Redshift catalogs, DynamoDB keys and indexes, Glue Data Catalog, S3 exports, CloudWatch metrics, tags, IAM policies, Lake Formation rules, and existing documentation.&lt;/p&gt;

&lt;p&gt;The opportunity is to convert these scattered signals into a governed Enterprise Knowledge Layer.&lt;/p&gt;

&lt;p&gt;That layer becomes the memory and context foundation for agentic AI.&lt;/p&gt;

&lt;p&gt;My view is simple:&lt;/p&gt;

&lt;p&gt;Models give agents reasoning power.&lt;/p&gt;

&lt;p&gt;Tools give agents execution power.&lt;/p&gt;

&lt;p&gt;Knowledge gives agents enterprise judgment.&lt;/p&gt;

&lt;p&gt;Without that knowledge layer, agentic AI will remain impressive in demos and fragile in production.&lt;/p&gt;

&lt;p&gt;With it, enterprises can build agents that do not just act fast, but act correctly, safely, and in alignment with how the business actually works.&lt;/p&gt;

</description>
      <category>enterpriseai</category>
      <category>agentskills</category>
      <category>aws</category>
      <category>agentknowledgemanagement</category>
    </item>
    <item>
      <title>Takeway from AWS Generative AI Lens</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Thu, 11 Jun 2026 05:41:26 +0000</pubDate>
      <link>https://dev.to/aws-builders/takeway-from-aws-generative-ai-lens-14dj</link>
      <guid>https://dev.to/aws-builders/takeway-from-aws-generative-ai-lens-14dj</guid>
      <description>&lt;p&gt;I was going through the &lt;a href="https://docs.aws.amazon.com/wellarchitected/latest/generative-ai-lens/scenarios.html" rel="noopener noreferrer"&gt;AWS Generative AI Lens&lt;/a&gt; recently, especially the sections on agentic AI, real-world scenarios, and cost optimization.&lt;/p&gt;

&lt;p&gt;My biggest takeaway was simple:&lt;/p&gt;

&lt;p&gt;Enterprise AI is not about adding a chatbot everywhere.&lt;/p&gt;

&lt;p&gt;It is about deciding where AI should assist, where it should reason, and where it should actually take action.&lt;/p&gt;

&lt;p&gt;That difference is important.&lt;/p&gt;

&lt;p&gt;A chatbot is mostly a conversation layer. It answers questions, summarizes information, or helps users find something faster. Useful, yes. But limited.&lt;/p&gt;

&lt;p&gt;Agentic AI goes one step further. It can understand a goal, collect context, use tools, call APIs, check results, and continue the workflow. That is where things become powerful, but also risky.&lt;/p&gt;

&lt;p&gt;And honestly, I think this is where many teams will make mistakes.&lt;/p&gt;

&lt;p&gt;They will jump directly into “autonomous agents” because it sounds advanced. But in most enterprise systems, full autonomy should not be the first step.&lt;/p&gt;

&lt;p&gt;A lot of use cases only need controlled AI-assisted workflows.&lt;/p&gt;

&lt;p&gt;For example, if a support ticket comes in, AI can classify the issue, extract the important details, check the knowledge base, and suggest the next action. That does not need a fully autonomous agent. It needs a reliable workflow with a few intelligent decision points.&lt;/p&gt;

&lt;p&gt;The same applies to document processing, internal knowledge search, compliance checks, report generation, and many operational tasks.&lt;/p&gt;

&lt;p&gt;Start simple. Keep the flow predictable. Add autonomy only where the process genuinely needs it.&lt;/p&gt;

&lt;p&gt;The AWS scenarios helped me think about this more practically. Use cases like autonomous call centers, generative BI, incident response, code review, Kanban workflows, and knowledge-worker copilots are not just “AI features.” They are business workflows being redesigned with AI inside them.&lt;/p&gt;

&lt;p&gt;That is the right mental model.&lt;/p&gt;

&lt;p&gt;Take generative BI as an example.&lt;/p&gt;

&lt;p&gt;The value is not just that a user can ask a question in English and get a chart. The real value is that business users can get answers without always depending on analysts or knowing the database structure.&lt;/p&gt;

&lt;p&gt;But this only works if the data layer is governed properly. Access control, semantic consistency, row-level security, auditability, and accuracy all matter. Otherwise, the system may produce a confident answer that is wrong, or worse, expose information to the wrong person.&lt;/p&gt;

&lt;p&gt;So the hard part is not the natural language interface.&lt;/p&gt;

&lt;p&gt;The hard part is trust.&lt;/p&gt;

&lt;p&gt;Incident response is another strong example.&lt;/p&gt;

&lt;p&gt;During an incident, engineers waste a lot of time moving between dashboards, logs, alerts, deployment history, tickets, and runbooks. An AI-assisted incident system can collect this context quickly, summarize what changed, compare with past incidents, and suggest likely causes.&lt;/p&gt;

&lt;p&gt;That is a good use case for agentic behavior because incidents are rarely linear. The system needs to investigate, adjust, and reason through incomplete information.&lt;/p&gt;

&lt;p&gt;But even there, I would not give the agent full production control on day one.&lt;/p&gt;

&lt;p&gt;Let it investigate first.&lt;/p&gt;

&lt;p&gt;Then let it recommend.&lt;/p&gt;

&lt;p&gt;Then automate low-risk actions.&lt;/p&gt;

&lt;p&gt;Only after enough confidence, allow higher-impact actions with approval.&lt;/p&gt;

&lt;p&gt;That is how autonomy should mature in the enterprise.&lt;/p&gt;

&lt;p&gt;Another important AWS scenario is the multi-tenant generative AI platform. I think this will become very relevant for larger organizations.&lt;/p&gt;

&lt;p&gt;Without a shared platform, every team starts building its own AI stack. One team builds its own RAG pipeline. Another team creates its own prompt management. Another team handles model access differently. Cost tracking becomes scattered. Guardrails become inconsistent. Security reviews become repetitive.&lt;/p&gt;

&lt;p&gt;That does not scale.&lt;/p&gt;

&lt;p&gt;A central AI platform can solve this by providing reusable capabilities: model access, retrieval, evaluation, guardrails, monitoring, cost visibility, and deployment patterns.&lt;/p&gt;

&lt;p&gt;But it should not become a slow central gatekeeper.&lt;/p&gt;

&lt;p&gt;The better approach is platform plus autonomy. The central team provides the foundation. Business and product teams build specific use cases on top.&lt;/p&gt;

&lt;p&gt;That is how companies can move fast without creating AI chaos.&lt;/p&gt;

&lt;p&gt;The cost optimization section was also a good reminder.&lt;/p&gt;

&lt;p&gt;Generative AI cost behaves differently from traditional software cost.&lt;/p&gt;

&lt;p&gt;In a normal application, one user action may trigger one API call or one database query.&lt;/p&gt;

&lt;p&gt;In an agentic system, one user request can trigger multiple model calls, retrieval steps, tool calls, retries, evaluations, and follow-up reasoning loops.&lt;/p&gt;

&lt;p&gt;If you do not design boundaries, cost can grow silently.&lt;/p&gt;

&lt;p&gt;This is why model selection matters. Not every task needs the largest model. A classification task, summarization task, routing task, and deep reasoning task may need different models.&lt;/p&gt;

&lt;p&gt;The mature question is not:&lt;/p&gt;

&lt;p&gt;“Which model is best?”&lt;/p&gt;

&lt;p&gt;The better question is:&lt;/p&gt;

&lt;p&gt;“Which model is good enough for this task at the right cost, latency, and reliability?”&lt;/p&gt;

&lt;p&gt;That is a very different engineering mindset.&lt;/p&gt;

&lt;p&gt;Cost-aware AI design also means keeping prompts clean, controlling response length, optimizing vector stores, caching repeated results, setting iteration limits, and defining clear exit conditions for agents.&lt;/p&gt;

&lt;p&gt;Especially for agents, workflow boundaries are critical.&lt;/p&gt;

&lt;p&gt;An agent should not keep thinking, retrying, searching, or calling tools without a clear stop condition. That is not intelligence. That is bad engineering.&lt;/p&gt;

&lt;p&gt;My final takeaway is this:&lt;/p&gt;

&lt;p&gt;AI should be treated as part of the production architecture, not as a side experiment.&lt;/p&gt;

&lt;p&gt;The companies that succeed will not be the ones with the flashiest demos. They will be the ones that understand where AI fits into the actual workflow, how much autonomy is safe, what controls are needed, and whether the cost is justified by the outcome.&lt;/p&gt;

&lt;p&gt;For me, the real question is no longer:&lt;/p&gt;

&lt;p&gt;“How do we add AI to this product?”&lt;/p&gt;

&lt;p&gt;The better question is:&lt;/p&gt;

&lt;p&gt;“Which workflow can become smarter, faster, or more scalable if AI is designed into it properly?”&lt;/p&gt;

&lt;p&gt;That is where agentic AI becomes useful.&lt;/p&gt;

&lt;p&gt;Not as hype.&lt;/p&gt;

&lt;p&gt;As architecture.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>agentskills</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Hosting MCP Gateway Registry on AWS ECS: A Practical Blueprint for Enterprise Agentic AI Systems</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Sun, 24 May 2026 09:13:03 +0000</pubDate>
      <link>https://dev.to/aws-builders/hosting-mcp-gateway-registry-on-aws-ecs-a-practical-blueprint-for-enterprise-agentic-ai-systems-18a4</link>
      <guid>https://dev.to/aws-builders/hosting-mcp-gateway-registry-on-aws-ecs-a-practical-blueprint-for-enterprise-agentic-ai-systems-18a4</guid>
      <description>&lt;h1&gt;
  
  
  Hosting MCP Gateway Registry on AWS ECS: A Practical Blueprint for Enterprise Agentic AI Systems
&lt;/h1&gt;

&lt;p&gt;AI agents are no longer just demo applications that answer questions.&lt;/p&gt;

&lt;p&gt;They are slowly becoming systems that can take action: search customer records, update opportunities, generate quotes, create tickets, check inventory, read contracts, trigger workflows, and interact with business applications.&lt;/p&gt;

&lt;p&gt;That is where the real enterprise problem begins.&lt;/p&gt;

&lt;p&gt;When an AI agent only chats, the risk is limited. But when an agent starts using tools, APIs, and enterprise systems, we need a much stronger operating model. We need to know what the agent can access, who approved that access, what data it can touch, and how we can monitor every action.&lt;/p&gt;

&lt;p&gt;This is exactly where an &lt;strong&gt;MCP Gateway and Registry&lt;/strong&gt; becomes important.&lt;/p&gt;

&lt;p&gt;The MCP Gateway Registry gives us a central place to register MCP servers, discover available tools, manage authentication, control access, and observe how agents interact with enterprise capabilities.&lt;/p&gt;

&lt;p&gt;In this blog, I will walk through how we can host an MCP Gateway Registry on AWS using ECS Fargate, based on the Terraform AWS ECS deployment model from the MCP Gateway Registry project. This blog is based on the repo &lt;a href="https://github.com/agentic-community/mcp-gateway-registry/tree/main" rel="noopener noreferrer"&gt;https://github.com/agentic-community/mcp-gateway-registry/tree/main&lt;/a&gt; and all credit goes to repo contributors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Problem Matters
&lt;/h2&gt;

&lt;p&gt;In early AI agent projects, the architecture usually starts simple.&lt;/p&gt;

&lt;p&gt;One agent connects to one or two tools.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sales Agent
   |
   |-- Salesforce MCP Server
   |-- Knowledge Base MCP Server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works well for a proof of concept.&lt;/p&gt;

&lt;p&gt;But after some time, more teams start building agents.&lt;/p&gt;

&lt;p&gt;The sales team wants Salesforce and quote tools.&lt;br&gt;
The support team wants ticketing and knowledge base tools.&lt;br&gt;
The finance team wants billing and contract tools.&lt;br&gt;
The delivery team wants Jira, project reports, and document search tools.&lt;br&gt;
The leadership team wants reporting and analytics agents.&lt;/p&gt;

&lt;p&gt;Very quickly, the environment starts looking like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent 1 ---&amp;gt; MCP Server A
Agent 1 ---&amp;gt; MCP Server B
Agent 2 ---&amp;gt; MCP Server A
Agent 2 ---&amp;gt; MCP Server C
Agent 3 ---&amp;gt; MCP Server D
Agent 4 ---&amp;gt; MCP Server B
Agent 5 ---&amp;gt; MCP Server E
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this stage, the issue is no longer just technical integration.&lt;/p&gt;

&lt;p&gt;The real problems are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Who owns each MCP server?
Which agent is allowed to use which server?
What permissions does each tool have?
How do we prevent duplicate MCP servers?
How do we audit tool usage?
How do we onboard new tools safely?
How do we remove old or risky tools?
How do we monitor failures?
How do we stop agents from accessing sensitive systems without approval?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we do not solve this early, the MCP layer can become another uncontrolled integration layer.&lt;/p&gt;

&lt;p&gt;And in enterprise systems, uncontrolled integration always becomes a risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  What an MCP Gateway Registry Actually Does
&lt;/h2&gt;

&lt;p&gt;An MCP Gateway Registry acts as a control plane between AI agents and MCP servers.&lt;/p&gt;

&lt;p&gt;Instead of letting every agent directly connect to every MCP server, we introduce a managed gateway and registry layer.&lt;/p&gt;

&lt;p&gt;The architecture becomes cleaner:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI Agents / Developers / Applications
              |
              v
      MCP Gateway and Registry
              |
              v
        Approved MCP Servers
              |
              v
      Enterprise Applications
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives us a much better operating model.&lt;/p&gt;

&lt;p&gt;The registry helps maintain information about available MCP servers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Server name
Owner
Description
Capabilities
Available tools
Security scopes
Environment
Version
Health status
Approval status
Discovery metadata
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The gateway helps control and route access:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Authentication
Authorization
Tool discovery
Request routing
Policy enforcement
Logging
Monitoring
Access control
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is important because enterprise agents should not randomly discover and use tools. They should use approved tools with approved scopes through a governed access path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Hosting This on AWS ECS Makes Sense
&lt;/h2&gt;

&lt;p&gt;There are multiple ways to host an MCP Gateway Registry.&lt;/p&gt;

&lt;p&gt;You can run it on virtual machines.&lt;br&gt;
You can deploy it on Kubernetes.&lt;br&gt;
You can run it on ECS.&lt;br&gt;
You can even start with a simple Docker Compose deployment for local testing.&lt;/p&gt;

&lt;p&gt;But for an enterprise-grade AWS deployment, &lt;strong&gt;ECS Fargate is a very practical option&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It gives us a managed container runtime without the operational overhead of managing EC2 worker nodes or a full Kubernetes control plane.&lt;/p&gt;

&lt;p&gt;For this type of gateway, ECS Fargate gives a good balance between simplicity and production readiness.&lt;/p&gt;

&lt;p&gt;Key benefits include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;No EC2 server management
Container-based deployment
Built-in integration with IAM
Easy logging through CloudWatch
Service-level health checks
Integration with Application Load Balancer
Auto-scaling support
Good fit for Terraform automation
Lower operational complexity than Kubernetes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In my view, unless an organization already has a mature EKS platform and Kubernetes operating model, ECS Fargate is a better first choice for hosting this kind of control-plane service.&lt;/p&gt;

&lt;p&gt;Kubernetes gives more flexibility, but it also adds more operational responsibility. For many teams, that is not needed on day one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Target AWS Architecture
&lt;/h2&gt;

&lt;p&gt;A production-style AWS architecture for MCP Gateway Registry can look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Users / Agents / Developers
          |
          v
Route 53 Custom Domain
          |
          v
CloudFront
          |
          v
AWS WAF
          |
          v
Application Load Balancer
          |
          v
ECS Fargate Services
   |          |           |
Registry   Auth Server   Keycloak
   |          |           |
   |          |           v
   |          |      Aurora PostgreSQL
   |
   v
Amazon DocumentDB

Supporting Services:
- AWS Secrets Manager
- CloudWatch Logs
- CloudWatch Alarms
- ECR
- IAM
- ACM
- Optional Prometheus and Grafana
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not just about running containers.&lt;/p&gt;

&lt;p&gt;This architecture gives us:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Secure external access
Managed container hosting
Central authentication
Registry persistence
Secret management
Observability
Certificate management
Custom domain support
Infrastructure automation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the difference between a demo deployment and an enterprise deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core AWS Components
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Amazon ECS Fargate
&lt;/h3&gt;

&lt;p&gt;ECS Fargate runs the containerized services.&lt;/p&gt;

&lt;p&gt;The deployment can include multiple services such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MCP Gateway Registry
Authentication server
Keycloak
MCP gateway service
Sample MCP servers
Sample agents
Observability components
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each service runs as an ECS task.&lt;/p&gt;

&lt;p&gt;In production, I would recommend separating these into clear services rather than bundling too much into one container. This gives better control over scaling, logging, deployments, and troubleshooting.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Registry service       --&amp;gt; Handles MCP server metadata and discovery
Auth service           --&amp;gt; Handles authentication flow
Keycloak service       --&amp;gt; Identity and access management
Sample MCP services    --&amp;gt; Optional, mostly for demo or validation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production, sample agents and sample MCP servers should be disabled or deployed only in a non-production environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Application Load Balancer
&lt;/h3&gt;

&lt;p&gt;The Application Load Balancer exposes the ECS services through HTTPS endpoints.&lt;/p&gt;

&lt;p&gt;It performs routing to the correct ECS target group.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/registry  --&amp;gt; Registry service
/auth      --&amp;gt; Auth service
/keycloak  --&amp;gt; Keycloak service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or, in a cleaner production model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;registry.company.com  --&amp;gt; Registry service
auth.company.com      --&amp;gt; Auth service
kc.company.com        --&amp;gt; Keycloak
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This domain-based separation is better for enterprise usage because it improves clarity, security boundaries, and operational ownership.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. CloudFront
&lt;/h3&gt;

&lt;p&gt;CloudFront can sit in front of the ALB.&lt;/p&gt;

&lt;p&gt;For production, this is useful because it gives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Global edge access
Better TLS handling
Additional protection layer
Integration point for WAF
Cleaner public access pattern
Potential performance benefits
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For internal-only deployments, CloudFront may not always be required. But if the registry is accessed by distributed teams, external developers, or cloud-hosted agents, CloudFront becomes useful.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. AWS WAF
&lt;/h3&gt;

&lt;p&gt;I would strongly recommend using AWS WAF in front of internet-facing endpoints.&lt;/p&gt;

&lt;p&gt;The MCP gateway is a sensitive entry point because it controls access to tools. So it should not be exposed casually.&lt;/p&gt;

&lt;p&gt;Useful WAF controls include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Rate limiting
AWS managed rule groups
IP restrictions
Bot protection
Geo restrictions if required
SQL injection protection
Cross-site scripting protection
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is especially important if agents, developers, or external systems access the gateway over the internet.&lt;/p&gt;




&lt;h3&gt;
  
  
  5. Route 53 and ACM
&lt;/h3&gt;

&lt;p&gt;Route 53 manages DNS records.&lt;/p&gt;

&lt;p&gt;ACM provides SSL/TLS certificates.&lt;/p&gt;

&lt;p&gt;This gives us clean URLs such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;registry.company.com
auth.company.com
kc.company.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For enterprise adoption, this matters more than people think. Clean domain names make the platform feel like a real internal product rather than a temporary engineering setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Amazon Aurora PostgreSQL
&lt;/h3&gt;

&lt;p&gt;Aurora PostgreSQL is used for Keycloak data.&lt;/p&gt;

&lt;p&gt;Keycloak needs a relational database to store identity-related information, including:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Users
Realms
Clients
Roles
Sessions
Identity provider configuration
Authentication settings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using Aurora gives better reliability than running a database inside a container.&lt;/p&gt;

&lt;p&gt;For production, I would avoid containerized databases for this type of platform. Identity is too important to treat casually.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Amazon DocumentDB
&lt;/h3&gt;

&lt;p&gt;DocumentDB is used by the registry layer.&lt;/p&gt;

&lt;p&gt;This is where MCP server and agent metadata can be stored.&lt;/p&gt;

&lt;p&gt;Example records may include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MCP server name
MCP server URL
Tool list
Tool descriptions
Security scopes
Server health
Owner team
Environment
Version
Approval state
Risk classification
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Over time, this registry becomes the enterprise catalog for agent-accessible capabilities.&lt;/p&gt;

&lt;p&gt;This is very valuable.&lt;/p&gt;

&lt;p&gt;It allows teams to search and discover what tools already exist instead of rebuilding the same MCP servers again and again.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. AWS Secrets Manager
&lt;/h3&gt;

&lt;p&gt;Secrets Manager should be used for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Database credentials
Keycloak admin credentials
JWT secrets
Client secrets
Service credentials
API keys
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No production credential should be hardcoded inside Terraform files, Docker images, or environment files stored in Git.&lt;/p&gt;

&lt;p&gt;This is basic, but it is often missed in early AI platform projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. CloudWatch Logs and Alarms
&lt;/h3&gt;

&lt;p&gt;Every ECS service should write logs to CloudWatch.&lt;/p&gt;

&lt;p&gt;At minimum, we should monitor:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Container startup failures
Authentication failures
Registry API errors
Tool discovery failures
Database connection errors
ECS task restarts
ALB 4xx errors
ALB 5xx errors
High latency
Memory pressure
CPU pressure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But for an MCP gateway, infrastructure logs are not enough.&lt;/p&gt;

&lt;p&gt;We also need agent activity logs.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Which agent requested tool discovery?
Which MCP server was selected?
Which tool was invoked?
Which scope was used?
Was the request allowed or denied?
What was the response status?
How long did the tool call take?
Was sensitive data involved?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where the MCP gateway starts becoming a governance system, not just a routing layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment Options
&lt;/h2&gt;

&lt;p&gt;The Terraform setup supports different deployment modes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1: CloudFront Only
&lt;/h3&gt;

&lt;p&gt;This is useful for a quick POC.&lt;/p&gt;

&lt;p&gt;You do not need a custom domain. You get a CloudFront-generated URL.&lt;/p&gt;

&lt;p&gt;This is suitable for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Internal demo
Engineering validation
Architecture exploration
Short-term sandbox
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not my preferred option for production, but it is a good way to start quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: Custom Domain Only
&lt;/h3&gt;

&lt;p&gt;In this model, Route 53 and ACM are used, but CloudFront may not be enabled.&lt;/p&gt;

&lt;p&gt;You get URLs like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;registry.company.com
kc.company.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is better than a random generated URL, but it may not give enough edge protection if exposed publicly.&lt;/p&gt;

&lt;p&gt;This can work well for private/internal deployments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 3: CloudFront + Custom Domain
&lt;/h3&gt;

&lt;p&gt;This is the best production model.&lt;/p&gt;

&lt;p&gt;Traffic flows like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User / Agent
    |
    v
Custom Domain
    |
    v
CloudFront
    |
    v
WAF
    |
    v
Application Load Balancer
    |
    v
ECS Fargate Service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives a stronger production posture.&lt;/p&gt;

&lt;p&gt;My recommendation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Use CloudFront + Route 53 + WAF for production.
Use CloudFront-only for demo.
Use custom domain-only only for controlled internal environments.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Practical Deployment Flow
&lt;/h2&gt;

&lt;p&gt;The deployment flow can be divided into clear stages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 1: Prepare AWS Account
&lt;/h3&gt;

&lt;p&gt;Before starting, we should decide:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AWS region
VPC strategy
Domain name
Environment name
Access model
CIDR restrictions
Secrets strategy
Terraform state backend
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production, I would not deploy this into a random shared AWS account.&lt;/p&gt;

&lt;p&gt;Better model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Separate AWS account for dev
Separate AWS account for staging
Separate AWS account for production
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At minimum, use separate environments and separate Terraform state.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 2: Build and Push Images to ECR
&lt;/h3&gt;

&lt;p&gt;The services need to be built as Docker images and pushed to Amazon ECR.&lt;/p&gt;

&lt;p&gt;A simplified flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-east-1
make build-push
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is a set of ECR image URIs.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;123456789012.dkr.ecr.us-east-1.amazonaws.com/mcp-gateway-registry:v1.0.0
123456789012.dkr.ecr.us-east-1.amazonaws.com/mcp-gateway-auth:v1.0.0
123456789012.dkr.ecr.us-east-1.amazonaws.com/mcp-gateway:v1.0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production, avoid using &lt;code&gt;latest&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Use versioned immutable tags.&lt;/p&gt;

&lt;p&gt;Bad:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mcp-gateway-registry:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mcp-gateway-registry:v1.0.3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Best:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mcp-gateway-registry:v1.0.3-build-20260524
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This helps with rollback, audit, and release traceability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 3: Configure Terraform Variables
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;terraform.tfvars&lt;/code&gt; file is where we configure the deployment.&lt;/p&gt;

&lt;p&gt;Important values include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;aws_region&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;

&lt;span class="nx"&gt;enable_cloudfront&lt;/span&gt;  &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="nx"&gt;enable_route53_dns&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="nx"&gt;base_domain&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"company.com"&lt;/span&gt;

&lt;span class="nx"&gt;session_cookie_domain&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;".company.com"&lt;/span&gt;
&lt;span class="nx"&gt;session_cookie_secure&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="nx"&gt;ingress_cidr_blocks&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="s2"&gt;"YOUR_OFFICE_IP/32"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s2"&gt;"YOUR_VPN_IP/32"&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Database and admin passwords should be handled carefully.&lt;/p&gt;

&lt;p&gt;In a strong production model, these should come from a secure secret injection process rather than being manually placed in local files.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 4: Initialize Terraform
&lt;/h3&gt;

&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform init &lt;span class="nt"&gt;-upgrade&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For production, Terraform state should be stored remotely.&lt;/p&gt;

&lt;p&gt;Recommended backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;S3 bucket for state
DynamoDB table for locking
KMS encryption
Restricted IAM access
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do not use local state for production.&lt;/p&gt;

&lt;p&gt;Local state is acceptable for learning, but not for enterprise infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 5: Create Certificates First
&lt;/h3&gt;

&lt;p&gt;ACM certificates often require DNS validation.&lt;/p&gt;

&lt;p&gt;That is why the deployment may need a first targeted apply for certificates.&lt;/p&gt;

&lt;p&gt;Conceptually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform apply &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;aws_acm_certificate.keycloak &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;aws_acm_certificate.registry &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;aws_acm_certificate_validation.keycloak &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;aws_acm_certificate_validation.registry
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows certificates to be created and validated before the rest of the infrastructure depends on them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 6: Deploy Full Infrastructure
&lt;/h3&gt;

&lt;p&gt;After certificate validation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This deploys the full stack:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Networking
Security groups
ECS cluster
ECS services
ALB
Target groups
CloudFront
Route 53 records
Aurora PostgreSQL
DocumentDB
Secrets
CloudWatch logs
IAM roles
Optional observability stack
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, the infrastructure is created, but the application may still need initialization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 7: Run Post-Deployment Setup
&lt;/h3&gt;

&lt;p&gt;Post-deployment setup is very important.&lt;/p&gt;

&lt;p&gt;This step usually performs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Terraform output extraction
DNS validation
ECS service health checks
Keycloak realm setup
Client setup
Admin user setup
DocumentDB collection initialization
Registry indexes
Scope setup
Service restart
Endpoint validation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This step converts infrastructure into a usable platform.&lt;/p&gt;

&lt;p&gt;Without this, the containers may be running, but the gateway may not be fully ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the Gateway Should Be Used After Hosting
&lt;/h2&gt;

&lt;p&gt;Once deployed, teams can start registering MCP servers.&lt;/p&gt;

&lt;p&gt;A good MCP server registration should include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Server name
Business capability
Owner team
Technical owner
Environment
Base URL
Supported tools
Required scopes
Risk level
Data classification
Health check endpoint
Approval status
Version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Name: Salesforce Opportunity MCP Server
Owner: Sales Platform Team
Environment: Production
Tools:
- searchOpportunity
- updateOpportunityStage
- getAccountDetails
Scopes:
- salesforce.read
- salesforce.opportunity.update
Risk: High
Data: Customer and revenue data
Approval: Required
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This level of metadata is important.&lt;/p&gt;

&lt;p&gt;Without it, the registry becomes just another technical catalog. With it, the registry becomes a real enterprise control plane.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enterprise Governance Model
&lt;/h2&gt;

&lt;p&gt;For enterprise usage, I would define a clear lifecycle for MCP servers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Suggested MCP Server Lifecycle
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Draft
   |
Submitted for Review
   |
Security Review
   |
Approved for Dev
   |
Approved for Production
   |
Monitored
   |
Deprecated
   |
Retired
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every MCP server should have an owner.&lt;/p&gt;

&lt;p&gt;Every high-risk tool should have approval.&lt;/p&gt;

&lt;p&gt;Every production MCP server should have monitoring.&lt;/p&gt;

&lt;p&gt;Every deprecated server should have a retirement date.&lt;/p&gt;

&lt;p&gt;This may sound heavy, but it is necessary once agents start touching real systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Access Control Model
&lt;/h2&gt;

&lt;p&gt;The gateway should not allow all agents to use all MCP servers.&lt;/p&gt;

&lt;p&gt;That is a weak design.&lt;/p&gt;

&lt;p&gt;A better model is scope-based access.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent: Sales Copilot
Allowed scopes:
- salesforce.read
- quote.read
- product.search

Not allowed:
- discount.approve
- contract.delete
- customer.export
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Another example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent: Deal Desk Agent
Allowed scopes:
- quote.read
- quote.update
- discount.request
- contract.read

Requires approval:
- discount.approve
- final_quote.submit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is how we prevent agents from becoming over-permissioned.&lt;/p&gt;

&lt;p&gt;One of the biggest risks in agentic AI systems will be excessive tool permission. If we give one agent too many tools and too much authority, it becomes hard to control behavior and impact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability for Agentic Systems
&lt;/h2&gt;

&lt;p&gt;Traditional application monitoring is not enough here.&lt;/p&gt;

&lt;p&gt;We need both system observability and agent observability.&lt;/p&gt;

&lt;h3&gt;
  
  
  System Observability
&lt;/h3&gt;

&lt;p&gt;Track:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CPU
Memory
Container restarts
Task failures
ALB errors
Request latency
Database connections
Authentication errors
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Agent and Tool Observability
&lt;/h3&gt;

&lt;p&gt;Track:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent ID
User ID
Tool requested
MCP server used
Scope used
Decision outcome
Policy result
Execution latency
Failure reason
Data classification
External system touched
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, a useful audit log may look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sales-copilot"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"john@company.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcp_server"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"salesforce-opportunity-server"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"updateOpportunityStage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scope"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"salesforce.opportunity.update"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"decision"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"allowed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-24T10:15:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"latency_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;450&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"success"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This type of logging becomes extremely important when something goes wrong.&lt;/p&gt;

&lt;p&gt;If an agent updates the wrong opportunity or calls a pricing tool incorrectly, we should be able to reconstruct exactly what happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  CI/CD Model
&lt;/h2&gt;

&lt;p&gt;For production, deployment should not be manual.&lt;/p&gt;

&lt;p&gt;A good CI/CD pipeline should look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Developer raises PR
        |
Code review
        |
Build Docker images
        |
Run unit tests
        |
Run container security scan
        |
Push image to ECR
        |
Terraform plan
        |
Manual approval for production
        |
Terraform apply
        |
Run post-deployment setup
        |
Smoke test
        |
Notify platform team
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps the deployment controlled and auditable.&lt;/p&gt;

&lt;p&gt;For rollback, the team should be able to redeploy a previous image tag quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended Environment Strategy
&lt;/h2&gt;

&lt;p&gt;I would recommend at least three environments.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Development
Staging
Production
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Development
&lt;/h3&gt;

&lt;p&gt;Used for engineering testing.&lt;/p&gt;

&lt;p&gt;Can have relaxed settings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sample MCP servers allowed
Lower database capacity
CloudFront-only mode acceptable
Limited monitoring
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Staging
&lt;/h3&gt;

&lt;p&gt;Used for pre-production validation.&lt;/p&gt;

&lt;p&gt;Should be close to production.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Custom domain
WAF enabled
Production-like IAM
Production-like secrets
Observability enabled
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Production
&lt;/h3&gt;

&lt;p&gt;Used for real enterprise agents.&lt;/p&gt;

&lt;p&gt;Should be hardened.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Separate AWS account
CloudFront + WAF
Private subnets
Strict ingress
Immutable images
Centralized logs
Audit trail
Backup enabled
Approval workflow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Production Hardening Checklist
&lt;/h2&gt;

&lt;p&gt;Before calling this production-ready, I would validate the following.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Remote Terraform state enabled
Terraform state encrypted
DynamoDB locking enabled
Separate AWS accounts or environments
Secrets stored in Secrets Manager
No secrets in Git
CloudFront enabled
WAF enabled
Ingress restricted
Keycloak admin access restricted
ECS tasks in private subnets
ALB security groups reviewed
Aurora backups enabled
DocumentDB backups enabled
CloudWatch alarms configured
Container image scanning enabled
Immutable image tags used
IAM least privilege applied
Audit logging enabled
MCP server ownership defined
Tool scopes defined
Production approval process defined
Runbook created
Rollback process tested
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The most common mistake is to stop after the Terraform deployment succeeds.&lt;/p&gt;

&lt;p&gt;That only means infrastructure exists.&lt;/p&gt;

&lt;p&gt;It does not mean the platform is secure, governed, observable, or ready for production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operational Runbook
&lt;/h2&gt;

&lt;p&gt;For a serious enterprise setup, the platform team should maintain a simple runbook.&lt;/p&gt;

&lt;p&gt;The runbook should answer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;How do we onboard a new MCP server?
How do we approve a production MCP server?
How do we revoke access?
How do we rotate secrets?
How do we check service health?
How do we debug registry failures?
How do we debug authentication failures?
How do we rollback a release?
How do we retire an old MCP server?
How do we investigate suspicious tool usage?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where platform maturity comes in.&lt;/p&gt;

&lt;p&gt;An MCP gateway is not a one-time deployment. It becomes part of the agentic AI platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Fits in an Enterprise Agent Architecture
&lt;/h2&gt;

&lt;p&gt;In a broader enterprise agentic AI architecture, the MCP Gateway Registry sits between orchestration and enterprise tools.&lt;/p&gt;

&lt;p&gt;A practical model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Interface
      |
      v
Agent Orchestrator
      |
      v
Policy / Guardrail Layer
      |
      v
MCP Gateway Registry
      |
      v
MCP Servers
      |
      v
Enterprise Systems
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The orchestrator decides what needs to be done.&lt;/p&gt;

&lt;p&gt;The policy layer checks whether the action is allowed.&lt;/p&gt;

&lt;p&gt;The MCP gateway provides controlled tool discovery and access.&lt;/p&gt;

&lt;p&gt;The MCP server performs the actual system interaction.&lt;/p&gt;

&lt;p&gt;This separation is important.&lt;/p&gt;

&lt;p&gt;Do not put all responsibilities into one big agent.&lt;/p&gt;

&lt;p&gt;That becomes hard to scale, hard to debug, and dangerous to govern.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Practical Recommendation
&lt;/h2&gt;

&lt;p&gt;For a real enterprise deployment, I would host the MCP Gateway Registry with this setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AWS ECS Fargate for services
CloudFront in front
AWS WAF enabled
Route 53 custom domains
ACM certificates
Application Load Balancer
Private subnets for ECS tasks
Aurora PostgreSQL for Keycloak
DocumentDB for registry metadata
Secrets Manager for credentials
CloudWatch for logs and alarms
Optional Grafana and Prometheus for deeper observability
S3 backend for Terraform state
DynamoDB for Terraform locking
CI/CD for image build and deployment
Immutable ECR image tags
Strict admin access
Scope-based authorization
Audit logs for all tool usage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a POC, I would keep it simple.&lt;/p&gt;

&lt;p&gt;For production, I would not compromise on security, logging, and access control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Lessons Learned
&lt;/h2&gt;

&lt;p&gt;The biggest lesson is this:&lt;/p&gt;

&lt;p&gt;Hosting the MCP Gateway Registry is not only an infrastructure activity. It is the beginning of an operating model for enterprise agents.&lt;/p&gt;

&lt;p&gt;If agents are going to use real tools, then organizations need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tool ownership
Tool approval
Tool discovery
Tool scopes
Tool observability
Tool lifecycle management
Tool risk classification
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, agentic AI systems may work technically but fail operationally.&lt;/p&gt;

&lt;p&gt;And in enterprises, operational failure is usually what blocks adoption.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;MCP is making tool integration more standard for AI agents. That is a very important shift.&lt;/p&gt;

&lt;p&gt;But standardization also creates scale.&lt;/p&gt;

&lt;p&gt;And once we scale the number of agents and tools, we need governance.&lt;/p&gt;

&lt;p&gt;That is why an MCP Gateway Registry should be treated as a core platform capability, not as a side component.&lt;/p&gt;

&lt;p&gt;It gives engineering teams a structured way to expose tools.&lt;br&gt;
It gives security teams a way to control access.&lt;br&gt;
It gives platform teams a way to monitor usage.&lt;br&gt;
It gives business teams more confidence that agents are not directly and blindly touching enterprise systems.&lt;/p&gt;

&lt;p&gt;In my view, this is one of the important building blocks for production-grade agentic AI systems.&lt;/p&gt;

&lt;p&gt;The future will not be one agent directly connected to many tools.&lt;/p&gt;

&lt;p&gt;The future will be governed agent ecosystems, where tools are registered, discoverable, monitored, secured, and lifecycle-managed through a central control plane.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>agents</category>
      <category>ecs</category>
      <category>mcp</category>
    </item>
    <item>
      <title>When One AI Agent Is Not Enough: A Practical Delegation Pattern for Enterprise Systems</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Sat, 23 May 2026 17:50:08 +0000</pubDate>
      <link>https://dev.to/aws-builders/when-one-ai-agent-is-not-enough-a-practical-delegation-pattern-for-enterprise-systems-16nb</link>
      <guid>https://dev.to/aws-builders/when-one-ai-agent-is-not-enough-a-practical-delegation-pattern-for-enterprise-systems-16nb</guid>
      <description>&lt;h1&gt;
  
  
  When One AI Agent Is Not Enough: A Practical Delegation Pattern for Enterprise Systems
&lt;/h1&gt;

&lt;p&gt;A lot of enterprise AI systems start the same way.&lt;/p&gt;

&lt;p&gt;One agent.&lt;br&gt;
One big prompt.&lt;br&gt;
A bunch of tools.&lt;br&gt;
A lot of hope.&lt;/p&gt;

&lt;p&gt;At first, it looks great. The agent can answer questions, call a few systems, maybe even complete a useful workflow. But once the use case gets more realistic, cracks start to show.&lt;/p&gt;

&lt;p&gt;The agent has to understand too much.&lt;br&gt;
It has to access too many systems.&lt;br&gt;
It has to make too many different kinds of decisions.&lt;br&gt;
And when something goes wrong, it is hard to tell where the problem actually is.&lt;/p&gt;

&lt;p&gt;That is usually the point where the issue stops being “prompt quality” and starts becoming “system design.”&lt;/p&gt;

&lt;p&gt;One pattern I’ve found especially useful is delegation across agents and subagents.&lt;/p&gt;

&lt;p&gt;Not because it sounds advanced.&lt;br&gt;
Because it is often the more practical way to build enterprise AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem with a single large agent
&lt;/h2&gt;

&lt;p&gt;There is an appealing simplicity in saying, “Let one agent handle the whole thing.”&lt;/p&gt;

&lt;p&gt;But enterprise workflows are rarely that clean.&lt;/p&gt;

&lt;p&gt;Take something simple on the surface, like a customer escalation.&lt;/p&gt;

&lt;p&gt;To handle it well, the system may need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pull ticket history&lt;/li&gt;
&lt;li&gt;understand product context&lt;/li&gt;
&lt;li&gt;check support policy&lt;/li&gt;
&lt;li&gt;review account state&lt;/li&gt;
&lt;li&gt;recommend next actions&lt;/li&gt;
&lt;li&gt;trigger an internal workflow&lt;/li&gt;
&lt;li&gt;draft a reply&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Yes, one agent can try to do all of that.&lt;/p&gt;

&lt;p&gt;But in practice, the more responsibilities you pile into one agent, the more fragile it becomes.&lt;/p&gt;

&lt;p&gt;You usually end up with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;too much context going into one step&lt;/li&gt;
&lt;li&gt;too many tools available to one component&lt;/li&gt;
&lt;li&gt;weaker predictability&lt;/li&gt;
&lt;li&gt;weaker governance&lt;/li&gt;
&lt;li&gt;and much harder debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system may still “work,” but it becomes difficult to trust.&lt;/p&gt;

&lt;h1&gt;
  
  
  A better pattern: one lead agent, a few focused subagents
&lt;/h1&gt;

&lt;p&gt;The cleaner pattern is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Primary agent -&amp;gt; specialist subagents -&amp;gt; final outcome&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;primary agent&lt;/strong&gt; owns the workflow.&lt;/p&gt;

&lt;p&gt;Its job is to understand the request, decide what needs to happen, delegate the right pieces of work, and then combine the results.&lt;/p&gt;

&lt;p&gt;The subagents each do one thing well.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a retrieval subagent gets the right context&lt;/li&gt;
&lt;li&gt;a policy subagent checks rules or entitlements&lt;/li&gt;
&lt;li&gt;an analysis subagent recommends next steps&lt;/li&gt;
&lt;li&gt;an execution subagent handles approved downstream actions&lt;/li&gt;
&lt;li&gt;a communication subagent drafts the final message&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a much healthier design than asking one broad agent to do everything in one pass.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this pattern works better
&lt;/h2&gt;

&lt;p&gt;The first reason is simple: &lt;strong&gt;focus&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A retrieval subagent can focus on retrieval.&lt;br&gt;
A policy subagent can focus on policy.&lt;br&gt;
An execution subagent can focus on action.&lt;/p&gt;

&lt;p&gt;You are not forcing one component to juggle too many responsibilities.&lt;/p&gt;

&lt;p&gt;The second reason is control.&lt;/p&gt;

&lt;p&gt;Different subagents can have different permissions, different tools, and different operating boundaries. That is much easier to govern in enterprise systems.&lt;/p&gt;

&lt;p&gt;The third reason is observability.&lt;/p&gt;

&lt;p&gt;If the outcome is wrong, you have a better shot at knowing where it went wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;bad retrieval&lt;/li&gt;
&lt;li&gt;wrong policy interpretation&lt;/li&gt;
&lt;li&gt;weak action selection&lt;/li&gt;
&lt;li&gt;poor response generation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a huge advantage once the system moves beyond demo stage.&lt;/p&gt;

&lt;h1&gt;
  
  
  What the primary agent should actually do
&lt;/h1&gt;

&lt;p&gt;One mistake I see is treating the primary agent like a simple router.&lt;/p&gt;

&lt;p&gt;That is not enough.&lt;/p&gt;

&lt;p&gt;The primary agent should behave more like a coordinator.&lt;/p&gt;

&lt;p&gt;It should:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;understand the incoming request&lt;/li&gt;
&lt;li&gt;decide what subtasks are needed&lt;/li&gt;
&lt;li&gt;choose the right subagents&lt;/li&gt;
&lt;li&gt;pass only the necessary context&lt;/li&gt;
&lt;li&gt;review what comes back&lt;/li&gt;
&lt;li&gt;and decide whether to continue, retry, escalate, or stop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, it owns the workflow logic.&lt;/p&gt;

&lt;p&gt;It should not blindly trust every subagent output.&lt;br&gt;
It should have judgment.&lt;/p&gt;

&lt;p&gt;That is what makes delegation useful rather than just decorative.&lt;/p&gt;

&lt;h1&gt;
  
  
  What makes a good subagent
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;A good subagent is narrow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is probably the single most important design rule.&lt;/p&gt;

&lt;p&gt;Each subagent should ideally have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one clear job&lt;/li&gt;
&lt;li&gt;limited tools&lt;/li&gt;
&lt;li&gt;limited context&lt;/li&gt;
&lt;li&gt;a defined output format&lt;/li&gt;
&lt;li&gt;clear boundaries on what it should not do&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If a subagent is doing retrieval, analysis, execution, and communication together, it is no longer a real specialist.&lt;/p&gt;

&lt;p&gt;It is just another general-purpose agent with a different label. And once you do that, the value of delegation starts disappearing.&lt;/p&gt;

&lt;h1&gt;
  
  
  A sharper example
&lt;/h1&gt;

&lt;p&gt;Let’s go back to the customer escalation example.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bad design
&lt;/h3&gt;

&lt;p&gt;One large agent receives the case and tries to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read the issue&lt;/li&gt;
&lt;li&gt;search past history&lt;/li&gt;
&lt;li&gt;check policy&lt;/li&gt;
&lt;li&gt;assess severity&lt;/li&gt;
&lt;li&gt;decide the next action&lt;/li&gt;
&lt;li&gt;update internal systems&lt;/li&gt;
&lt;li&gt;draft the reply&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This may work sometimes.&lt;/p&gt;

&lt;p&gt;But it is too much responsibility in one place.&lt;/p&gt;

&lt;h3&gt;
  
  
  Better design
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Primary agent&lt;/strong&gt;&lt;br&gt;
Owns the overall case flow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Retrieval subagent&lt;/strong&gt;&lt;br&gt;
Gathers ticket history, account context, product details, and related documentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Policy subagent&lt;/strong&gt;&lt;br&gt;
Checks entitlement, SLA, escalation rules, and any support constraints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Analysis subagent&lt;/strong&gt;&lt;br&gt;
Looks at the combined context and suggests the best next step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Execution subagent&lt;/strong&gt;&lt;br&gt;
Triggers the approved workflow, creates tasks, or updates systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Communication subagent&lt;/strong&gt;&lt;br&gt;
Drafts the customer-facing or internal message.&lt;/p&gt;

&lt;p&gt;Now the workflow is clearer.&lt;br&gt;
Each step is easier to test.&lt;br&gt;
And if the result is weak, you can usually tell why.&lt;/p&gt;

&lt;h1&gt;
  
  
  When delegation is worth it
&lt;/h1&gt;

&lt;p&gt;Not every use case needs this pattern.&lt;/p&gt;

&lt;p&gt;Sometimes one well-designed agent is enough.&lt;/p&gt;

&lt;p&gt;Delegation becomes useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the workflow crosses different domains&lt;/li&gt;
&lt;li&gt;different systems or permissions are involved&lt;/li&gt;
&lt;li&gt;some work can happen in parallel&lt;/li&gt;
&lt;li&gt;one agent is becoming overloaded&lt;/li&gt;
&lt;li&gt;governance starts getting messy&lt;/li&gt;
&lt;li&gt;you want better testing and failure isolation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the workflow is small and bounded, keep it simple.&lt;/p&gt;

&lt;p&gt;The point is not to add more agents for the sake of it.&lt;br&gt;
The point is to use delegation when specialization clearly improves the system.&lt;/p&gt;

&lt;h1&gt;
  
  
  Practical rules that help
&lt;/h1&gt;

&lt;h2&gt;
  
  
  1. Start with a small number of subagents
&lt;/h2&gt;

&lt;p&gt;Do not build a maze.&lt;/p&gt;

&lt;p&gt;Start with one primary agent and maybe two or three specialists. That is usually enough to prove whether the pattern is helping.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Keep context tight
&lt;/h2&gt;

&lt;p&gt;Do not pass everything to every agent.&lt;/p&gt;

&lt;p&gt;Each subagent should get only the context it actually needs. Too much context often makes outputs worse, not better.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Use structured outputs
&lt;/h2&gt;

&lt;p&gt;Subagents should return something predictable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a decision&lt;/li&gt;
&lt;li&gt;a label&lt;/li&gt;
&lt;li&gt;a ranked list&lt;/li&gt;
&lt;li&gt;a JSON object&lt;/li&gt;
&lt;li&gt;a recommendation plus confidence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not vague prose that another component has to guess at.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Design low-confidence paths
&lt;/h2&gt;

&lt;p&gt;If a subagent is not confident, that should trigger something explicit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retry&lt;/li&gt;
&lt;li&gt;clarification&lt;/li&gt;
&lt;li&gt;fallback logic&lt;/li&gt;
&lt;li&gt;human review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not let weak outputs quietly flow into the rest of the chain.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Log the handoffs
&lt;/h2&gt;

&lt;p&gt;You need to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what task was delegated&lt;/li&gt;
&lt;li&gt;what context was passed&lt;/li&gt;
&lt;li&gt;what came back&lt;/li&gt;
&lt;li&gt;what happened next&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that, debugging becomes painful very quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Control tools by role
&lt;/h2&gt;

&lt;p&gt;A retrieval subagent should not have broad execution rights.&lt;br&gt;
An execution subagent should not have unnecessary access to everything.&lt;br&gt;
Different responsibilities should have different permissions.&lt;/p&gt;

&lt;p&gt;That is one of the easiest ways to keep governance strong.&lt;/p&gt;

&lt;h1&gt;
  
  
  Common mistakes
&lt;/h1&gt;

&lt;p&gt;A few patterns show up again and again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Too many agents too early&lt;/strong&gt;&lt;br&gt;
More moving parts do not automatically make the design better.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Subagents with overlapping jobs&lt;/strong&gt;&lt;br&gt;
If roles are fuzzy, delegation becomes noisy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Passing all context everywhere&lt;/strong&gt;&lt;br&gt;
That weakens specialization fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No fallback design&lt;/strong&gt;&lt;br&gt;
One failed subtask should not silently break the whole workflow.&lt;/p&gt;

&lt;p&gt;This is an architecture pattern.&lt;/p&gt;

&lt;h1&gt;
  
  
  Final thought
&lt;/h1&gt;

&lt;p&gt;Delegation across agents and subagents is one of the more practical patterns in enterprise AI.&lt;/p&gt;

&lt;p&gt;Not because it is clever.&lt;br&gt;
Because it reflects how real systems usually need to operate.&lt;/p&gt;

&lt;p&gt;The strongest setups are usually not the ones with the most agents.&lt;/p&gt;

&lt;p&gt;They are the ones where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the primary agent clearly owns the workflow&lt;/li&gt;
&lt;li&gt;the subagents are genuinely specialized&lt;/li&gt;
&lt;li&gt;the context is controlled&lt;/li&gt;
&lt;li&gt;the outputs are structured&lt;/li&gt;
&lt;li&gt;and the operating model is easy to debug and govern&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is what turns a multi-agent design from an interesting idea into something you can actually run in production.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>enterpriseai</category>
      <category>agenticai</category>
    </item>
    <item>
      <title>A Scaling Lesson Building Production-Grade Agentic AI Systems</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Tue, 19 May 2026 18:30:50 +0000</pubDate>
      <link>https://dev.to/amitkayal/a-scaling-lesson-building-production-grade-agentic-ai-systems-4kgp</link>
      <guid>https://dev.to/amitkayal/a-scaling-lesson-building-production-grade-agentic-ai-systems-4kgp</guid>
      <description>&lt;h1&gt;
  
  
  A Scaling Lesson Building Production-Grade Agentic AI Systems
&lt;/h1&gt;

&lt;p&gt;One of the early observations we had while designing enterprise AI agents was this:&lt;/p&gt;

&lt;p&gt;Giving an agent more tools does not necessarily make it smarter.&lt;/p&gt;

&lt;p&gt;In theory, it sounded correct.&lt;/p&gt;

&lt;p&gt;If an agent had access to customer systems, payment systems, inventory, shipping, reporting, ticketing, email, scheduling, analytics, and internal knowledge bases — it should become more powerful and autonomous.&lt;/p&gt;

&lt;p&gt;But what we observed in real implementations was very different.&lt;/p&gt;

&lt;p&gt;The more tools we added, the more unstable the system became.&lt;/p&gt;

&lt;p&gt;Not because the model was weak.&lt;/p&gt;

&lt;p&gt;Not because the tools were poorly built.&lt;/p&gt;

&lt;p&gt;But because the agent’s decision space became too large.&lt;/p&gt;

&lt;p&gt;For every user request, the agent had to evaluate all available tools, compare descriptions, infer intent, decide sequencing, and determine the best execution path.&lt;/p&gt;

&lt;p&gt;Now imagine doing this with 18 tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer lookup&lt;/li&gt;
&lt;li&gt;Order search&lt;/li&gt;
&lt;li&gt;Refund processing&lt;/li&gt;
&lt;li&gt;Inventory checking&lt;/li&gt;
&lt;li&gt;Shipping tracking&lt;/li&gt;
&lt;li&gt;Email sending&lt;/li&gt;
&lt;li&gt;Ticket creation&lt;/li&gt;
&lt;li&gt;Knowledge base search&lt;/li&gt;
&lt;li&gt;Sentiment analysis&lt;/li&gt;
&lt;li&gt;Language translation&lt;/li&gt;
&lt;li&gt;Calendar scheduling&lt;/li&gt;
&lt;li&gt;Report generation&lt;/li&gt;
&lt;li&gt;Data export&lt;/li&gt;
&lt;li&gt;User authentication&lt;/li&gt;
&lt;li&gt;Payment processing&lt;/li&gt;
&lt;li&gt;Discount application&lt;/li&gt;
&lt;li&gt;Feedback collection&lt;/li&gt;
&lt;li&gt;Escalation routing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Initially, everything looked manageable.&lt;/p&gt;

&lt;p&gt;But as workflows became more dynamic, we started observing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;wrong tool selection,&lt;/li&gt;
&lt;li&gt;unnecessary tool chaining,&lt;/li&gt;
&lt;li&gt;higher latency,&lt;/li&gt;
&lt;li&gt;increased token usage,&lt;/li&gt;
&lt;li&gt;inconsistent execution paths,&lt;/li&gt;
&lt;li&gt;and occasional hallucinated actions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem was not intelligence.&lt;/p&gt;

&lt;p&gt;The problem was cognitive overload inside the orchestration layer.&lt;/p&gt;

&lt;p&gt;Over time, one pattern became very clear:&lt;/p&gt;

&lt;p&gt;Agents perform significantly better when their responsibility boundaries are smaller.&lt;/p&gt;

&lt;p&gt;In our experience, once an agent moves beyond roughly 4–5 actively usable tools, reliability starts dropping rapidly. Similar enterprise orchestration patterns are now recommending smaller, specialized agents instead of monolithic “super agents.”&lt;/p&gt;

&lt;p&gt;That observation changed how we started designing AI systems.&lt;/p&gt;

&lt;p&gt;Instead of building one massive “do everything” agent, we moved toward specialized agents with tightly scoped responsibilities.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;A support agent handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;customer lookup,&lt;/li&gt;
&lt;li&gt;ticket creation,&lt;/li&gt;
&lt;li&gt;escalation routing,&lt;/li&gt;
&lt;li&gt;knowledge retrieval.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A commerce agent handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;orders,&lt;/li&gt;
&lt;li&gt;refunds,&lt;/li&gt;
&lt;li&gt;discounts,&lt;/li&gt;
&lt;li&gt;payments.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An operations agent handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;shipping,&lt;/li&gt;
&lt;li&gt;inventory,&lt;/li&gt;
&lt;li&gt;reporting,&lt;/li&gt;
&lt;li&gt;exports.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This immediately improved:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tool accuracy,&lt;/li&gt;
&lt;li&gt;execution consistency,&lt;/li&gt;
&lt;li&gt;observability,&lt;/li&gt;
&lt;li&gt;debugging,&lt;/li&gt;
&lt;li&gt;latency,&lt;/li&gt;
&lt;li&gt;and operational trust.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But another important learning came later.&lt;/p&gt;

&lt;p&gt;Even after distributing tools properly, systems still degraded when too many agents were active simultaneously.&lt;/p&gt;

&lt;p&gt;This is something many teams underestimate.&lt;/p&gt;

&lt;p&gt;As the number of agents increases, coordination overhead also increases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;more inter-agent communication,&lt;/li&gt;
&lt;li&gt;more memory synchronization,&lt;/li&gt;
&lt;li&gt;more orchestration reasoning,&lt;/li&gt;
&lt;li&gt;more retries,&lt;/li&gt;
&lt;li&gt;more conflict resolution,&lt;/li&gt;
&lt;li&gt;and more state tracking.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At lower scale, this is manageable.&lt;/p&gt;

&lt;p&gt;At enterprise scale, it becomes a serious engineering challenge.&lt;/p&gt;

&lt;p&gt;We observed cases where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;agents started waiting on each other,&lt;/li&gt;
&lt;li&gt;orchestration layers became bottlenecks,&lt;/li&gt;
&lt;li&gt;duplicate reasoning increased token burn,&lt;/li&gt;
&lt;li&gt;cascading retries created operational instability,&lt;/li&gt;
&lt;li&gt;and observability became extremely difficult.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multi-agent systems introduce their own scaling complexity around coordination, governance, and orchestration overhead. Most production-grade architecture guidance today recommends keeping orchestration layers as simple as possible.&lt;/p&gt;

&lt;p&gt;Over time, we established a few practical thumb rules internally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Some Practical Thumb Rules We Follow Now
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Keep Tool Count Small Per Agent
&lt;/h4&gt;

&lt;p&gt;Our practical guideline today is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3–5 tools → ideal&lt;/li&gt;
&lt;li&gt;6–8 tools → manageable with careful prompting&lt;/li&gt;
&lt;li&gt;10+ tools → requires routing/filtering layers&lt;/li&gt;
&lt;li&gt;15+ tools → usually an architectural warning sign&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The issue is not model capability.&lt;/p&gt;

&lt;p&gt;It is decision dilution.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Every Agent Must Have One Clear Business Responsibility
&lt;/h4&gt;

&lt;p&gt;We avoid mixing domains.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;payments + support,&lt;/li&gt;
&lt;li&gt;analytics + execution,&lt;/li&gt;
&lt;li&gt;reporting + approvals,&lt;/li&gt;
&lt;li&gt;inventory + customer engagement.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The narrower the responsibility boundary, the more predictable the behavior.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Start With the Lowest Complexity Possible
&lt;/h4&gt;

&lt;p&gt;One important learning from enterprise orchestration patterns is this:&lt;/p&gt;

&lt;p&gt;Do not introduce multi-agent architecture unless the workflow genuinely requires it.&lt;/p&gt;

&lt;p&gt;Sometimes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a prompt is enough,&lt;/li&gt;
&lt;li&gt;sometimes a single agent is enough,&lt;/li&gt;
&lt;li&gt;sometimes workflows are better handled through deterministic orchestration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not every problem needs “AI teamwork.”&lt;/p&gt;

&lt;h4&gt;
  
  
  4. Avoid Excessive Agent-to-Agent Conversations
&lt;/h4&gt;

&lt;p&gt;Agent collaboration sounds powerful in demos.&lt;/p&gt;

&lt;p&gt;But in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;every interaction increases latency,&lt;/li&gt;
&lt;li&gt;every message consumes tokens,&lt;/li&gt;
&lt;li&gt;every dependency creates failure paths.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We now aggressively reduce unnecessary conversations between agents.&lt;/p&gt;

&lt;h4&gt;
  
  
  5. Retrieval Before Reasoning
&lt;/h4&gt;

&lt;p&gt;Instead of exposing all tools to all agents, we first narrow candidates through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;semantic routing,&lt;/li&gt;
&lt;li&gt;metadata filtering,&lt;/li&gt;
&lt;li&gt;RAG-based retrieval,&lt;/li&gt;
&lt;li&gt;workflow classification.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This significantly improves tool selection accuracy and reduces reasoning load.&lt;/p&gt;

&lt;h4&gt;
  
  
  6. Observability Is Mandatory
&lt;/h4&gt;

&lt;p&gt;Once systems become multi-agent, debugging becomes one of the hardest engineering problems.&lt;/p&gt;

&lt;p&gt;We now treat the following as first-class requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;distributed tracing,&lt;/li&gt;
&lt;li&gt;token tracking,&lt;/li&gt;
&lt;li&gt;step-level logging,&lt;/li&gt;
&lt;li&gt;execution replay,&lt;/li&gt;
&lt;li&gt;agent health monitoring,&lt;/li&gt;
&lt;li&gt;retry visibility,&lt;/li&gt;
&lt;li&gt;and orchestration graphs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without observability, production support becomes nearly impossible.&lt;/p&gt;

&lt;h4&gt;
  
  
  7. Human Escalation Is Still Critical
&lt;/h4&gt;

&lt;p&gt;One thing we intentionally avoid is trying to automate every decision.&lt;/p&gt;

&lt;p&gt;We now introduce human checkpoints for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;financial operations,&lt;/li&gt;
&lt;li&gt;policy-sensitive actions,&lt;/li&gt;
&lt;li&gt;low-confidence reasoning,&lt;/li&gt;
&lt;li&gt;and customer-impacting workflows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Autonomy without governance becomes operational risk.&lt;/p&gt;

&lt;p&gt;What I increasingly believe is that the future of enterprise AI is not one giant super-agent.&lt;/p&gt;

&lt;p&gt;It is orchestrated systems of smaller specialized agents collaborating through routing, delegation, memory sharing, and controlled execution.&lt;/p&gt;

&lt;p&gt;The real engineering challenge is no longer:&lt;br&gt;
“How many tools can an agent use?”&lt;/p&gt;

&lt;p&gt;The better question is:&lt;br&gt;
“How effectively can we reduce the decision burden for each agent while keeping orchestration manageable?”&lt;/p&gt;

&lt;p&gt;That has become one of the most important scaling lessons for us while building production-grade agentic AI systems.&lt;/p&gt;

&lt;h1&gt;
  
  
  How We Are Thinking About This in Cloud Architecture
&lt;/h1&gt;

&lt;p&gt;One important realization for us was that multi-agent systems should not be treated as a single application deployment.&lt;/p&gt;

&lt;p&gt;They should be treated as distributed cloud-native systems.&lt;/p&gt;

&lt;p&gt;That changes the architecture significantly.&lt;/p&gt;

&lt;p&gt;Today, the architecture pattern we increasingly follow looks something like this:&lt;/p&gt;

&lt;h2&gt;
  
  
  Specialized Agents as Independent Services
&lt;/h2&gt;

&lt;p&gt;Each agent runs independently with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;isolated APIs,&lt;/li&gt;
&lt;li&gt;dedicated scaling,&lt;/li&gt;
&lt;li&gt;separate observability,&lt;/li&gt;
&lt;li&gt;isolated memory/context,&lt;/li&gt;
&lt;li&gt;and domain-level permissions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This reduces blast radius and improves operational governance.&lt;/p&gt;

&lt;p&gt;In AWS, this naturally aligns very well with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda,&lt;/li&gt;
&lt;li&gt;ECS/EKS,&lt;/li&gt;
&lt;li&gt;event-driven services,&lt;/li&gt;
&lt;li&gt;queues,&lt;/li&gt;
&lt;li&gt;Bedrock,&lt;/li&gt;
&lt;li&gt;and serverless orchestration patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I personally liked while evaluating newer AWS patterns is how Amazon Bedrock AgentCore is trying to standardize several production concerns around agents. Instead of teams writing custom orchestration glue repeatedly, AgentCore is introducing managed capabilities around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;runtime isolation,&lt;/li&gt;
&lt;li&gt;observability,&lt;/li&gt;
&lt;li&gt;memory,&lt;/li&gt;
&lt;li&gt;identity,&lt;/li&gt;
&lt;li&gt;tool gateways,&lt;/li&gt;
&lt;li&gt;and orchestration patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One thing I strongly relate to from practical experience is this:&lt;/p&gt;

&lt;p&gt;Building the reasoning layer is usually not the hardest part anymore.&lt;/p&gt;

&lt;p&gt;The harder part is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;orchestration,&lt;/li&gt;
&lt;li&gt;debugging,&lt;/li&gt;
&lt;li&gt;tracing,&lt;/li&gt;
&lt;li&gt;retries,&lt;/li&gt;
&lt;li&gt;governance,&lt;/li&gt;
&lt;li&gt;and operational scalability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is where systems usually become unstable at scale.&lt;/p&gt;

&lt;p&gt;AWS AgentCore Observability is also moving in an interesting direction by treating agent execution visibility as a first-class production capability with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;execution tracing,&lt;/li&gt;
&lt;li&gt;token monitoring,&lt;/li&gt;
&lt;li&gt;latency tracking,&lt;/li&gt;
&lt;li&gt;tool usage visibility,&lt;/li&gt;
&lt;li&gt;and CloudWatch integration. ()&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you have multiple agents collaborating dynamically, you need visibility into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why a tool was selected,&lt;/li&gt;
&lt;li&gt;which agent delegated the task,&lt;/li&gt;
&lt;li&gt;what context was shared,&lt;/li&gt;
&lt;li&gt;where retries happened,&lt;/li&gt;
&lt;li&gt;and why execution paths changed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without this, production debugging becomes extremely difficult.&lt;/p&gt;

&lt;p&gt;Another pattern we increasingly prefer is asynchronous orchestration.&lt;/p&gt;

&lt;p&gt;Instead of tightly coupling agents synchronously, we now lean more toward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;queues,&lt;/li&gt;
&lt;li&gt;events,&lt;/li&gt;
&lt;li&gt;workflow engines,&lt;/li&gt;
&lt;li&gt;and loosely coupled communication.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It improves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;resilience,&lt;/li&gt;
&lt;li&gt;scalability,&lt;/li&gt;
&lt;li&gt;retry handling,&lt;/li&gt;
&lt;li&gt;and fault isolation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most importantly, it prevents one overloaded agent from slowing down the entire system.&lt;/p&gt;

&lt;p&gt;What I increasingly believe is that the future of enterprise AI is not one giant super-agent.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>aws</category>
      <category>agentcore</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Technical debt handling</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 11 May 2026 13:15:12 +0000</pubDate>
      <link>https://dev.to/aws-builders/technical-debt-handling-38on</link>
      <guid>https://dev.to/aws-builders/technical-debt-handling-38on</guid>
      <description>&lt;p&gt;Over the years, my opinion on technical debt has changed a lot. Earlier, I used to think technical debt meant bad engineering decisions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Now I think differently&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In product companies, especially fast-moving SaaS and AI products, some level of technical debt is unavoidable. If teams try to make everything perfect from day one, they usually move too slowly.&lt;br&gt;
The real problem is not technical debt.&lt;br&gt;
The real problem is when nobody knows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why the shortcut was taken&lt;/li&gt;
&lt;li&gt;how long it can survive&lt;/li&gt;
&lt;li&gt;what impact it will create later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Personally, I look at technical debt in 3 broad categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strategic debt : Shortcuts taken consciously to move faster, validate ideas, or release quickly.&lt;/li&gt;
&lt;li&gt;Operational debt: Things that slowly start hurting deployments, production stability, debugging, support effort, and developer productivity.&lt;/li&gt;
&lt;li&gt;Architectural debt: This is the one that becomes dangerous over time. Scaling becomes harder, integrations become messy, releases become slower, and every new feature starts feeling more expensive to build.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I feel AI products make this even more complicated. In normal SaaS systems, debt usually impacts engineering speed. But in AI systems, technical debt can directly affect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;response quality&lt;/li&gt;
&lt;li&gt;hallucination handling&lt;/li&gt;
&lt;li&gt;latency&lt;/li&gt;
&lt;li&gt;observability&lt;/li&gt;
&lt;li&gt;model cost&lt;/li&gt;
&lt;li&gt;evaluation consistency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And because AI systems are probabilistic, debugging becomes much harder compared to traditional software.&lt;/p&gt;

&lt;p&gt;I’ve also seen SaaS platforms suffer heavily from invisible debt because of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multi-tenant complexity&lt;/li&gt;
&lt;li&gt;customer-specific customizations&lt;/li&gt;
&lt;li&gt;integrations&lt;/li&gt;
&lt;li&gt;deployment dependencies&lt;/li&gt;
&lt;li&gt;security and compliance requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One weak architectural decision early on can create pain for years.&lt;/p&gt;

&lt;p&gt;That’s why I personally prefer making technical debt visible and measurable instead of treating it as a future problem.&lt;/p&gt;

&lt;p&gt;Some of the signals I usually watch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deployment friction&lt;/li&gt;
&lt;li&gt;rollback frequency&lt;/li&gt;
&lt;li&gt;incident trends&lt;/li&gt;
&lt;li&gt;onboarding difficulty for new engineers&lt;/li&gt;
&lt;li&gt;release confidence&lt;/li&gt;
&lt;li&gt;overall engineering velocity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One pattern I’ve noticed repeatedly:&lt;br&gt;
When team size keeps increasing but delivery speed keeps dropping, technical debt is already affecting the organization.&lt;/p&gt;

</description>
      <category>design</category>
      <category>ai</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Learnings while working with long-running AI agents</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 11 May 2026 13:12:53 +0000</pubDate>
      <link>https://dev.to/aws-builders/learnings-while-working-with-long-running-ai-agents-pi9</link>
      <guid>https://dev.to/aws-builders/learnings-while-working-with-long-running-ai-agents-pi9</guid>
      <description>&lt;p&gt;One of my biggest learnings while working with long-running AI agents is that logging and progress reporting are not optional features when the agent is tightly coupled with a UI — they are part of the product experience itself.&lt;/p&gt;

&lt;p&gt;Initially, I used to think of logging mainly from a debugging or engineering perspective. But with agentic systems, especially long-running workflows involving multiple tools, reasoning steps, APIs, retries, or multi-agent coordination, I realized users experience “silence” very differently than traditional applications.&lt;br&gt;
When an agent takes 30 seconds, 2 minutes, or longer without visible progress, users immediately start questioning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the system stuck?&lt;/li&gt;
&lt;li&gt;Did my request fail?&lt;/li&gt;
&lt;li&gt;Is it doing the wrong thing?&lt;/li&gt;
&lt;li&gt;Should I refresh or retry?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That uncertainty destroys trust very quickly.&lt;br&gt;
I learned that users do not just want the final answer — they want confidence that the system is actively working toward the answer. Progress visibility creates psychological assurance. Even simple updates like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Analyzing uploaded documents…”&lt;/li&gt;
&lt;li&gt;“Fetching data from CRM…”&lt;/li&gt;
&lt;li&gt;“Generating recommendations…”&lt;/li&gt;
&lt;li&gt;“Validating final response…”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;dramatically improve user confidence and patience.&lt;br&gt;
Another major realization was that long-running agents are fundamentally non-deterministic systems. Unlike traditional APIs, agents can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;take different execution paths,&lt;/li&gt;
&lt;li&gt;loop through reasoning,&lt;/li&gt;
&lt;li&gt;invoke tools dynamically,&lt;/li&gt;
&lt;li&gt;retry failed steps,&lt;/li&gt;
&lt;li&gt;or spend time resolving ambiguity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without structured logging and traceability, debugging becomes extremely difficult because the same input may not always produce the same internal execution path. Modern AI observability emphasize tracing tool calls, reasoning paths, latency, token usage, and execution flow because agent behavior is inherently complex and probabilistic. &lt;/p&gt;

&lt;p&gt;I also learned that progress reporting is not only for users — it becomes equally important for engineering and operational visibility. Once agents move into production, observability helps teams identify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;where workflows slow down,&lt;/li&gt;
&lt;li&gt;which tool calls fail,&lt;/li&gt;
&lt;li&gt;why latency spikes happen,&lt;/li&gt;
&lt;li&gt;and where hallucinations or execution deviations originate. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One practical lesson I learned is that UI-integrated agents should expose execution state intentionally, not dump raw logs. There is a difference between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;engineering telemetry,&lt;/li&gt;
&lt;li&gt;operational traces,&lt;/li&gt;
&lt;li&gt;and user-friendly progress communication.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Users need understandable milestones, while engineers need deep execution traces.&lt;br&gt;
Another important learning was around perceived performance. In many cases, improving progress visibility improved user satisfaction more than reducing actual latency. A 90-second process with clear step-by-step reporting often feels faster and more reliable than a silent 40-second execution.&lt;/p&gt;

&lt;p&gt;Today, I strongly believe that for long-running AI agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;logging is part of reliability,&lt;/li&gt;
&lt;li&gt;progress reporting is part of UX,&lt;/li&gt;
&lt;li&gt;and observability is part of trust.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>genai</category>
      <category>agents</category>
      <category>aws</category>
    </item>
    <item>
      <title>Building a Hybrid AWS Microservices Platform with API Gateway, Lambda, ECS, and Load Balancers</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 20 Apr 2026 18:41:39 +0000</pubDate>
      <link>https://dev.to/amitkayal/building-a-hybrid-aws-microservices-platform-with-api-gateway-lambda-ecs-and-load-balancers-mnn</link>
      <guid>https://dev.to/amitkayal/building-a-hybrid-aws-microservices-platform-with-api-gateway-lambda-ecs-and-load-balancers-mnn</guid>
      <description>&lt;h1&gt;
  
  
  Building a Hybrid AWS Microservices Platform with API Gateway, Lambda, ECS, and Load Balancers
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When teams start splitting a large backend into smaller services, the first infrastructure question is usually not "How do we build a microservice?" but "How do we expose many different services safely, consistently, and without creating a networking mess?"&lt;/p&gt;

&lt;p&gt;Our architecture provides a practical answer to that problem using a hybrid AWS design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API Gateway as the front door&lt;/li&gt;
&lt;li&gt;Lambda for lightweight serverless capabilities and supporting workflows&lt;/li&gt;
&lt;li&gt;ECS Fargate for containerized business services&lt;/li&gt;
&lt;li&gt;Internal load balancers for private service routing&lt;/li&gt;
&lt;li&gt;Terraform for repeatable, staged infrastructure delivery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important architectural idea is separation of concerns. Public access, authentication, routing, container execution, and service discovery are all handled by different layers. That keeps the platform easier to scale and much easier to evolve as the number of services grows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Pattern
&lt;/h2&gt;

&lt;p&gt;At a high level, the platform follows this flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A client sends an HTTPS request to API Gateway.&lt;/li&gt;
&lt;li&gt;API Gateway applies request-level controls such as API key enforcement, CORS behavior, and route matching.&lt;/li&gt;
&lt;li&gt;The request is sent either to a Lambda-backed endpoint or to a private containerized service.&lt;/li&gt;
&lt;li&gt;For ECS services, traffic goes through a VPC Link into internal load balancing.&lt;/li&gt;
&lt;li&gt;The load balancer forwards the request to the correct ECS service based on path rules.&lt;/li&gt;
&lt;li&gt;ECS Fargate runs one or more healthy tasks for that service and returns the response.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This gives a single API surface to consumers while allowing the backend implementation to vary by use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Combine Lambda and ECS?
&lt;/h2&gt;

&lt;p&gt;A platform like this benefits from using both compute models rather than forcing every workload into one.&lt;/p&gt;

&lt;p&gt;Lambda is a strong fit for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lightweight request handlers&lt;/li&gt;
&lt;li&gt;event-driven tasks&lt;/li&gt;
&lt;li&gt;simple orchestration&lt;/li&gt;
&lt;li&gt;platform support functions&lt;/li&gt;
&lt;li&gt;endpoints that do not need a full container lifecycle&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ECS Fargate is a better fit for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;long-lived HTTP microservices&lt;/li&gt;
&lt;li&gt;containerized frameworks and dependencies&lt;/li&gt;
&lt;li&gt;services that need more predictable runtime behavior&lt;/li&gt;
&lt;li&gt;APIs that benefit from load balancing, health checks, and horizontal scaling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In our architecture, the design supports both. Some APIs are routed to Lambda-based services, while others are routed to ECS services defined through service configuration. That hybrid model is useful in real organizations because all services do not have the same runtime needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Three-Stage Infrastructure Model
&lt;/h2&gt;

&lt;p&gt;One of the strongest ideas in our architecture is the staged Terraform layout. Instead of deploying everything together, the infrastructure is split into three layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 1: Networking
&lt;/h3&gt;

&lt;p&gt;The first stage establishes the network foundation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VPC selection or creation&lt;/li&gt;
&lt;li&gt;public and private subnet discovery or provisioning&lt;/li&gt;
&lt;li&gt;internal Network Load Balancer&lt;/li&gt;
&lt;li&gt;internal Application Load Balancer&lt;/li&gt;
&lt;li&gt;VPC Link for API Gateway&lt;/li&gt;
&lt;li&gt;ECS task security group&lt;/li&gt;
&lt;li&gt;ALB log storage and network observability components&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This stage is intentionally infrastructure-only. No application services are deployed here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 2: Compute
&lt;/h3&gt;

&lt;p&gt;The second stage provisions the actual execution environment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ECS cluster on Fargate&lt;/li&gt;
&lt;li&gt;ECR repositories for service images&lt;/li&gt;
&lt;li&gt;target groups per service&lt;/li&gt;
&lt;li&gt;ALB listener and listener rules&lt;/li&gt;
&lt;li&gt;ECS service definitions&lt;/li&gt;
&lt;li&gt;CloudWatch log groups&lt;/li&gt;
&lt;li&gt;Lambda functions used by the platform&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This stage consumes outputs from the networking stage so the compute layer never hardcodes network assumptions in its own design.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 3: API Gateways
&lt;/h3&gt;

&lt;p&gt;The third stage exposes services through API Gateway:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a public API for internet-facing consumption&lt;/li&gt;
&lt;li&gt;a private API for VPC-only access&lt;/li&gt;
&lt;li&gt;route creation from service metadata&lt;/li&gt;
&lt;li&gt;VPC Link integrations for containerized services&lt;/li&gt;
&lt;li&gt;Lambda proxy integrations for Lambda-backed services&lt;/li&gt;
&lt;li&gt;API keys, usage plans, and stage configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This split is operationally important. Teams can change routing without rebuilding networking, and they can add services without redesigning the entire platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Request Path for ECS Services
&lt;/h2&gt;

&lt;p&gt;For containerized microservices, the implementation follows a private ingress model.&lt;/p&gt;

&lt;p&gt;The path is:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Client -&amp;gt; API Gateway -&amp;gt; VPC Link -&amp;gt; internal NLB -&amp;gt; internal ALB -&amp;gt; ECS service -&amp;gt; ECS task&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That may look like one hop too many at first, but each layer has a purpose.&lt;/p&gt;

&lt;h3&gt;
  
  
  API Gateway
&lt;/h3&gt;

&lt;p&gt;API Gateway is the public control plane. It handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TLS termination at the edge&lt;/li&gt;
&lt;li&gt;route exposure&lt;/li&gt;
&lt;li&gt;API key enforcement&lt;/li&gt;
&lt;li&gt;request and header mapping&lt;/li&gt;
&lt;li&gt;CORS handling&lt;/li&gt;
&lt;li&gt;stage-based deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It gives consumers a stable API contract while keeping the backend private.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why a VPC Link Is Used
&lt;/h3&gt;

&lt;p&gt;ECS services are not exposed directly to the internet. Instead, API Gateway connects privately into the VPC using a VPC Link. That allows the public API layer to reach internal services without making the services themselves public.&lt;/p&gt;

&lt;p&gt;This is a strong security pattern because the application runtime stays inside the VPC, but consumers still get a clean managed API endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why the Repository Uses Both NLB and ALB
&lt;/h3&gt;

&lt;p&gt;A useful implementation detail in our architecture is that the VPC Link targets an internal Network Load Balancer, and that NLB forwards to an internal Application Load Balancer.&lt;/p&gt;

&lt;p&gt;This arrangement provides two separate benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The NLB is used as the stable target for the API Gateway VPC Link.&lt;/li&gt;
&lt;li&gt;The ALB performs path-based routing to the actual microservices.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ALB is what makes many ECS services practical behind one internal entry point. Each service gets its own listener rule and target group, so the platform can route based on URL path rather than provisioning a separate load balancer per service.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Load Balancing Works
&lt;/h2&gt;

&lt;p&gt;The load-balancing model is service-oriented.&lt;/p&gt;

&lt;p&gt;Each ECS microservice contributes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a base API path&lt;/li&gt;
&lt;li&gt;an ALB path pattern&lt;/li&gt;
&lt;li&gt;a listener rule priority&lt;/li&gt;
&lt;li&gt;a container port&lt;/li&gt;
&lt;li&gt;a health check definition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From that metadata, Terraform creates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one target group per service&lt;/li&gt;
&lt;li&gt;one listener rule per service&lt;/li&gt;
&lt;li&gt;one ECS service per service&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means the routing layer is not manually duplicated for every new microservice. The service declares its path and runtime settings, and the platform generates the infrastructure around it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Target Groups
&lt;/h3&gt;

&lt;p&gt;Each target group points to ECS tasks using IP targets. That is the correct choice for Fargate because tasks run with their own elastic networking interfaces rather than on shared EC2 hosts.&lt;/p&gt;

&lt;p&gt;The target groups in this repository also use application-level health checks. A task is considered healthy only when its service endpoint responds successfully on the configured health path.&lt;/p&gt;

&lt;p&gt;That matters because container startup is not the same as application readiness. A service may be running from ECS's perspective but still not ready to receive traffic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Listener Rules
&lt;/h3&gt;

&lt;p&gt;The ALB listener is configured once, and each service gets a path-based rule. For example, a service under a quoting path can be matched independently from a service under a product-pricing path.&lt;/p&gt;

&lt;p&gt;This keeps the routing layer centralized and avoids deploying a dedicated ALB per service, which would become expensive and operationally noisy as the platform grows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Health Checks and Traffic Protection
&lt;/h3&gt;

&lt;p&gt;The repository uses health checks in multiple places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API health endpoints at the application level&lt;/li&gt;
&lt;li&gt;ALB target group health checks&lt;/li&gt;
&lt;li&gt;ECS service health grace periods&lt;/li&gt;
&lt;li&gt;container health checks inside the task definition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That layered approach improves resilience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unhealthy tasks are removed from target groups&lt;/li&gt;
&lt;li&gt;ECS replaces failed tasks&lt;/li&gt;
&lt;li&gt;API Gateway continues to route through the same private entry point&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a platform that can recover from instance-level failures without changing the public API contract.&lt;/p&gt;

&lt;h2&gt;
  
  
  How ECS Is Structured
&lt;/h2&gt;

&lt;p&gt;The ECS side of the platform is built for repeatability rather than one-off service definitions.&lt;/p&gt;

&lt;h3&gt;
  
  
  ECS Cluster
&lt;/h3&gt;

&lt;p&gt;The platform provisions a shared ECS cluster per environment. That allows multiple microservices to run within the same operational boundary while still being isolated at the task and service level.&lt;/p&gt;

&lt;p&gt;The cluster uses Fargate, which removes the need to manage EC2 worker nodes. This simplifies operations significantly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no patching of container hosts&lt;/li&gt;
&lt;li&gt;no cluster capacity management at the instance level&lt;/li&gt;
&lt;li&gt;easier scaling by task count&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reusable ECS Service Module
&lt;/h3&gt;

&lt;p&gt;Instead of defining each ECS service from scratch, the repository uses a reusable Terraform module for service deployment.&lt;/p&gt;

&lt;p&gt;That module is responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;task definition creation&lt;/li&gt;
&lt;li&gt;container logging configuration&lt;/li&gt;
&lt;li&gt;IAM role wiring&lt;/li&gt;
&lt;li&gt;ECS service creation&lt;/li&gt;
&lt;li&gt;target group attachment&lt;/li&gt;
&lt;li&gt;subnet and security group placement&lt;/li&gt;
&lt;li&gt;optional capacity provider strategy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a strong platform choice. It makes service onboarding consistent and reduces drift between services.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task Definitions
&lt;/h3&gt;

&lt;p&gt;Each service runs as a Fargate task with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a named container image from ECR&lt;/li&gt;
&lt;li&gt;CPU and memory settings&lt;/li&gt;
&lt;li&gt;environment variables&lt;/li&gt;
&lt;li&gt;a health check command&lt;/li&gt;
&lt;li&gt;CloudWatch logging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repository also includes support for an additional X-Ray sidecar container in the task definition pattern, which is useful for distributed tracing in a microservice environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Network Mode
&lt;/h3&gt;

&lt;p&gt;Tasks run with &lt;code&gt;awsvpc&lt;/code&gt; networking, which gives each task its own network interface and private IP. This is the standard model for ECS on Fargate and is what allows ALB target groups to use IP mode cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Subnet and Security Group Design
&lt;/h2&gt;

&lt;p&gt;This repository supports both existing/default VPC usage and a more segmented custom VPC model.&lt;/p&gt;

&lt;p&gt;That flexibility matters because many teams start in a default-VPC or dev-friendly setup and later move to stricter network isolation for staging and production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subnet Placement
&lt;/h3&gt;

&lt;p&gt;The network layer discovers public and private subnets where available. In a custom VPC, the design supports proper private subnet deployment. In a simpler default VPC setup, the platform can fall back to available public subnets when private ones are not present.&lt;/p&gt;

&lt;p&gt;This is an important operational nuance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;development environments often optimize for simplicity&lt;/li&gt;
&lt;li&gt;higher environments usually optimize for stricter isolation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repository is built to handle both.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Groups
&lt;/h3&gt;

&lt;p&gt;The security model follows least-privilege intent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ECS tasks accept application traffic from the internal load-balancing layer&lt;/li&gt;
&lt;li&gt;services are not directly internet-facing&lt;/li&gt;
&lt;li&gt;API Gateway reaches backend services through private network integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This keeps the application tier out of direct public exposure while still allowing a public API facade.&lt;/p&gt;

&lt;h2&gt;
  
  
  Config-Driven Service Onboarding
&lt;/h2&gt;

&lt;p&gt;One of the most scalable ideas in our architecture is that services are registered through configuration rather than by handcrafting infrastructure every time.&lt;/p&gt;

&lt;p&gt;There is a master service registry that lists enabled services per environment, and each service provides its own deployment metadata, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;service identity&lt;/li&gt;
&lt;li&gt;container port&lt;/li&gt;
&lt;li&gt;desired task count&lt;/li&gt;
&lt;li&gt;CPU and memory&lt;/li&gt;
&lt;li&gt;API base path&lt;/li&gt;
&lt;li&gt;ALB path pattern&lt;/li&gt;
&lt;li&gt;listener priority&lt;/li&gt;
&lt;li&gt;health check behavior&lt;/li&gt;
&lt;li&gt;logging retention&lt;/li&gt;
&lt;li&gt;autoscaling preferences&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This creates a platform model rather than a collection of unrelated microservices.&lt;/p&gt;

&lt;p&gt;Adding a new service becomes a repeatable process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create the service.&lt;/li&gt;
&lt;li&gt;Define its configuration.&lt;/li&gt;
&lt;li&gt;Register it in the service catalog.&lt;/li&gt;
&lt;li&gt;Build and publish the image.&lt;/li&gt;
&lt;li&gt;Apply Terraform stages.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is much easier to maintain than cloning infrastructure blocks over and over.&lt;/p&gt;

&lt;h2&gt;
  
  
  Container Delivery with ECR
&lt;/h2&gt;

&lt;p&gt;For ECS workloads, the container supply chain is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build the service image.&lt;/li&gt;
&lt;li&gt;Push it to an ECR repository.&lt;/li&gt;
&lt;li&gt;Reference the tagged image in the ECS task definition.&lt;/li&gt;
&lt;li&gt;Update the ECS service to roll out the new task definition.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Our platform provisions one ECR repository per service, with image scanning enabled. That is a good baseline for a microservices platform because it keeps artifacts separated by service while still following a common naming convention.&lt;/p&gt;

&lt;p&gt;There is also an explicit deployment phase between infrastructure provisioning and API exposure where container images are built and pushed. That is a practical real-world step many diagrams omit, but it is essential because ECS cannot run a service until the image exists in the registry.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Lambda Fits into the Platform
&lt;/h2&gt;

&lt;p&gt;Lambda is used here as a first-class platform option, not as an afterthought.&lt;/p&gt;

&lt;p&gt;There are two useful Lambda patterns in our architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Lambda as an API Backend
&lt;/h3&gt;

&lt;p&gt;Some services can be exposed through API Gateway using Lambda proxy integration. This is ideal for capabilities that are naturally event-driven, lightweight, or operationally simpler as functions than as always-on containers.&lt;/p&gt;

&lt;p&gt;In this model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API Gateway owns the route&lt;/li&gt;
&lt;li&gt;Lambda executes the business logic&lt;/li&gt;
&lt;li&gt;API Gateway returns the Lambda response directly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This avoids unnecessary load-balancer and container overhead for smaller workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Lambda as a Platform Support Function
&lt;/h3&gt;

&lt;p&gt;Our architecture also provisions Lambda functions that support the overall platform, such as authentication-related or onboarding-related workflows.&lt;/p&gt;

&lt;p&gt;This is a smart use of Lambda in a hybrid platform because not every supporting concern needs to run inside ECS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Authentication and API Protection
&lt;/h2&gt;

&lt;p&gt;Our architecture clearly treats API protection as an API Gateway concern.&lt;/p&gt;

&lt;p&gt;The current public API implementation enforces API key usage through API Gateway methods, API keys, and usage plans. The codebase also provisions a supporting API key validation Lambda function and related permissions, which shows the platform is designed to accommodate Lambda-based validation flows where needed.&lt;/p&gt;

&lt;p&gt;From a blog perspective, the important architectural takeaway is this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keep authentication and traffic governance at the gateway layer&lt;/li&gt;
&lt;li&gt;keep service containers focused on business logic&lt;/li&gt;
&lt;li&gt;keep private workloads private&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That separation keeps the platform easier to secure and easier to reason about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Public and Private API Models
&lt;/h2&gt;

&lt;p&gt;Another strength of our architecture is that it supports both public and private APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Public API
&lt;/h3&gt;

&lt;p&gt;The public API is intended for internet-facing access. It handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;external client access&lt;/li&gt;
&lt;li&gt;API keys and usage plans&lt;/li&gt;
&lt;li&gt;CORS behavior&lt;/li&gt;
&lt;li&gt;Lambda and ECS route exposure&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Private API
&lt;/h3&gt;

&lt;p&gt;The private API is intended for internal or VPC-scoped access. It is useful when services should only be reachable from trusted network boundaries such as internal AWS workloads, integration environments, or enterprise connectivity paths.&lt;/p&gt;

&lt;p&gt;This split is helpful when some capabilities should be public and others should remain internal even though they share the same service platform underneath.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability and Operations
&lt;/h2&gt;

&lt;p&gt;A microservices platform is only as good as its operational visibility.&lt;/p&gt;

&lt;p&gt;Our architecture includes observability at several levels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CloudWatch log groups for ECS services&lt;/li&gt;
&lt;li&gt;CloudWatch logs for Lambda functions&lt;/li&gt;
&lt;li&gt;API Gateway stage logging&lt;/li&gt;
&lt;li&gt;ALB logging support&lt;/li&gt;
&lt;li&gt;VPC flow logging&lt;/li&gt;
&lt;li&gt;X-Ray-friendly task patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination helps answer the most common production questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did the request reach the gateway?&lt;/li&gt;
&lt;li&gt;Was it routed to the right backend?&lt;/li&gt;
&lt;li&gt;Was the target healthy?&lt;/li&gt;
&lt;li&gt;Did the service fail or time out?&lt;/li&gt;
&lt;li&gt;Was the problem in networking, routing, or application logic?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that layered visibility, hybrid platforms become difficult to troubleshoot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling Characteristics
&lt;/h2&gt;

&lt;p&gt;This architecture scales well because each layer can evolve somewhat independently.&lt;/p&gt;

&lt;h3&gt;
  
  
  API Layer Scaling
&lt;/h3&gt;

&lt;p&gt;API Gateway absorbs public traffic without requiring the backend to manage edge-facing concerns directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  ECS Scaling
&lt;/h3&gt;

&lt;p&gt;ECS services scale by task count. Each service can define:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;desired count&lt;/li&gt;
&lt;li&gt;minimum and maximum capacity&lt;/li&gt;
&lt;li&gt;CPU and memory sizing&lt;/li&gt;
&lt;li&gt;autoscaling thresholds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means heavily used services can scale out without affecting lighter services.&lt;/p&gt;

&lt;h3&gt;
  
  
  Platform Growth
&lt;/h3&gt;

&lt;p&gt;As more services are added, the platform does not need a new ingress pattern each time. The same path-based routing model continues to work as long as route definitions and listener priorities stay clean.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alignment with AWS Well-Architected Best Practices
&lt;/h2&gt;

&lt;p&gt;This architecture also aligns well with AWS best-practice design principles, especially the AWS Well-Architected mindset.&lt;/p&gt;

&lt;h3&gt;
  
  
  Operational Excellence
&lt;/h3&gt;

&lt;p&gt;We have structured the platform so that it is operated as a system rather than as a collection of one-off deployments.&lt;/p&gt;

&lt;p&gt;This is reflected in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;staged Terraform deployments for clearer ownership and safer changes&lt;/li&gt;
&lt;li&gt;configuration-driven service onboarding&lt;/li&gt;
&lt;li&gt;consistent ECS service patterns through reusable modules&lt;/li&gt;
&lt;li&gt;standardized logging and deployment workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This reduces manual drift and makes operational changes more repeatable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security
&lt;/h3&gt;

&lt;p&gt;Security is addressed through layered controls rather than a single protection point.&lt;/p&gt;

&lt;p&gt;We have adhered to good AWS security practices by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;placing ECS services behind private networking rather than exposing them directly&lt;/li&gt;
&lt;li&gt;using API Gateway as the controlled ingress layer&lt;/li&gt;
&lt;li&gt;applying API-level protection at the gateway&lt;/li&gt;
&lt;li&gt;using security groups to limit east-west traffic&lt;/li&gt;
&lt;li&gt;supporting encrypted log and storage patterns&lt;/li&gt;
&lt;li&gt;separating public access from internal service routing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This follows the AWS principle of strong boundaries, least privilege, and defense in depth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reliability
&lt;/h3&gt;

&lt;p&gt;Reliability comes from designing for failure at the service and routing layers.&lt;/p&gt;

&lt;p&gt;We have incorporated that through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multi-AZ subnet placement&lt;/li&gt;
&lt;li&gt;load balancer health checks&lt;/li&gt;
&lt;li&gt;ECS task replacement behavior&lt;/li&gt;
&lt;li&gt;target group isolation per service&lt;/li&gt;
&lt;li&gt;decoupled gateway and backend layers&lt;/li&gt;
&lt;li&gt;staged infrastructure dependencies with clear outputs between layers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means a failing task or unhealthy target does not require the API surface itself to change.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Efficiency
&lt;/h3&gt;

&lt;p&gt;The architecture chooses the right compute model for the right workload.&lt;/p&gt;

&lt;p&gt;That is an AWS best practice because it avoids treating all traffic the same.&lt;/p&gt;

&lt;p&gt;Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda for lighter, event-oriented, or supporting workflows&lt;/li&gt;
&lt;li&gt;ECS Fargate for containerized services that need steady HTTP handling&lt;/li&gt;
&lt;li&gt;ALB path-based routing for efficient multi-service consolidation&lt;/li&gt;
&lt;li&gt;service-specific CPU, memory, and scaling settings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This lets us tune services independently instead of overprovisioning everything at the platform level.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost Optimization
&lt;/h3&gt;

&lt;p&gt;Cost optimization is also visible in the design choices.&lt;/p&gt;

&lt;p&gt;We are not multiplying infrastructure unnecessarily. Instead, the architecture encourages shared but controlled platform components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one API layer for many services&lt;/li&gt;
&lt;li&gt;one internal routing layer for many ECS workloads&lt;/li&gt;
&lt;li&gt;shared ECS cluster patterns per environment&lt;/li&gt;
&lt;li&gt;service-level scaling instead of blanket scaling&lt;/li&gt;
&lt;li&gt;support for Fargate and optional capacity-provider strategies where appropriate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is much closer to AWS best practice than provisioning separate ingress and compute stacks for every small service.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sustainability and Maintainability
&lt;/h3&gt;

&lt;p&gt;Even when sustainability is not called out directly, maintainable designs usually consume fewer engineering and infrastructure resources over time.&lt;/p&gt;

&lt;p&gt;The architecture helps here by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reducing duplicated infrastructure definitions&lt;/li&gt;
&lt;li&gt;making service onboarding metadata-driven&lt;/li&gt;
&lt;li&gt;encouraging reuse of shared platform components&lt;/li&gt;
&lt;li&gt;keeping the public contract stable while backend services evolve&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That leads to lower long-term complexity, which is a practical form of architectural efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Pattern Works Well
&lt;/h2&gt;

&lt;p&gt;This AWS pattern is effective because it balances standardization with flexibility.&lt;/p&gt;

&lt;p&gt;It standardizes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deployment stages&lt;/li&gt;
&lt;li&gt;ingress architecture&lt;/li&gt;
&lt;li&gt;service registration&lt;/li&gt;
&lt;li&gt;load-balancer behavior&lt;/li&gt;
&lt;li&gt;logging and health checks&lt;/li&gt;
&lt;li&gt;ECS service creation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It stays flexible by allowing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda-backed endpoints&lt;/li&gt;
&lt;li&gt;ECS-backed endpoints&lt;/li&gt;
&lt;li&gt;public and private APIs&lt;/li&gt;
&lt;li&gt;different service-level scaling and runtime settings&lt;/li&gt;
&lt;li&gt;multiple environments with different networking strategies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly what a growing microservices platform needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Implementation Advice
&lt;/h2&gt;

&lt;p&gt;If you want to implement a similar architecture, a good sequence is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build the networking foundation first.&lt;/li&gt;
&lt;li&gt;Keep all service backends private.&lt;/li&gt;
&lt;li&gt;Put API Gateway in front of everything external.&lt;/li&gt;
&lt;li&gt;Use ECS Fargate for containerized APIs that benefit from long-lived service behavior.&lt;/li&gt;
&lt;li&gt;Use Lambda for support functions and lightweight endpoints.&lt;/li&gt;
&lt;li&gt;Register services through metadata, not repetitive infrastructure definitions.&lt;/li&gt;
&lt;li&gt;Use path-based ALB routing so many services can share one internal ingress layer.&lt;/li&gt;
&lt;li&gt;Add strong health checks and centralized logs before traffic grows.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key is not just choosing AWS services, but assigning each AWS service a clear responsibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Our architecture demonstrates a mature way to implement Lambda and ECS-based microservices through API Gateway without exposing backend services directly.&lt;/p&gt;

&lt;p&gt;The architecture uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;staged Terraform for separation of concerns&lt;/li&gt;
&lt;li&gt;API Gateway as the public and private API facade&lt;/li&gt;
&lt;li&gt;Lambda where serverless execution makes sense&lt;/li&gt;
&lt;li&gt;ECS Fargate for containerized microservices&lt;/li&gt;
&lt;li&gt;NLB and ALB together for private, path-aware routing&lt;/li&gt;
&lt;li&gt;config-driven onboarding for scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For teams building an enterprise microservices platform, this is a strong pattern because it supports security, operational clarity, and service growth without forcing every workload into the same runtime model.&lt;/p&gt;

&lt;p&gt;Most importantly, it turns infrastructure into a reusable platform. Once that platform is in place, adding the next service becomes much easier than adding the first one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lessons Learned
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Keeping API Gateway as the front door and backend services private makes the architecture easier to secure and easier to evolve.&lt;/li&gt;
&lt;li&gt;Using both Lambda and ECS is more practical than forcing every use case into a single compute model.&lt;/li&gt;
&lt;li&gt;Path-based routing through shared internal load balancing scales better than creating isolated ingress infrastructure for every service.&lt;/li&gt;
&lt;li&gt;Service onboarding becomes significantly easier when routing, health checks, scaling, and runtime settings are driven by configuration.&lt;/li&gt;
&lt;li&gt;Health checks, logging, and observability need to be designed from the beginning; adding them later is much harder in a distributed system.&lt;/li&gt;
&lt;li&gt;A staged infrastructure model reduces operational risk because networking, compute, and API exposure can be changed independently.&lt;/li&gt;
&lt;li&gt;Standardizing platform patterns early saves substantial effort as the number of microservices grows.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>architecture</category>
      <category>aws</category>
      <category>lambda</category>
      <category>apigateway</category>
    </item>
    <item>
      <title>Building a Practical Lambda Capacity Provider Platform: Lessons Learned from Warm Pools, Version Hygiene, and CI/CD Reality</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 20 Apr 2026 18:25:06 +0000</pubDate>
      <link>https://dev.to/amitkayal/building-a-practical-lambda-capacity-provider-platform-lessons-learned-from-warm-pools-version-1l7j</link>
      <guid>https://dev.to/amitkayal/building-a-practical-lambda-capacity-provider-platform-lessons-learned-from-warm-pools-version-1l7j</guid>
      <description>&lt;h1&gt;
  
  
  Building a Practical Lambda Capacity Provider Platform: Lessons Learned from Warm Pools, Version Hygiene, and CI/CD Reality
&lt;/h1&gt;

&lt;p&gt;There is a big difference between a slide-deck architecture and an operating system you can trust on a Monday morning.&lt;/p&gt;

&lt;p&gt;This implementation captures that difference well. On paper, the idea is simple: create a shared AWS Lambda Managed Instances capacity provider, run latency-sensitive workloads on ARM64, keep the pool warm with EventBridge, prune old Lambda versions before they become operational debt, and wrap the whole thing in a GitHub Actions plus CodeBuild delivery model. In practice, each of those choices changes how you think about performance, cost, blast radius, and developer discipline.&lt;/p&gt;

&lt;p&gt;What follows is not a generic cloud post. It is the kind of write-up you produce after actually building and living with the system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Problem We Were Solving
&lt;/h2&gt;

&lt;p&gt;Traditional Lambda is excellent when you want abstraction and convenience. It becomes less elegant when your workload is sensitive to startup time, carries heavier dependencies, or needs more predictable execution behavior under bursty load.&lt;/p&gt;

&lt;p&gt;That is where a Lambda capacity provider changes the discussion.&lt;/p&gt;

&lt;p&gt;In this implementation, the platform is built around a shared &lt;code&gt;aws_lambda_capacity_provider&lt;/code&gt; that uses ARM64 Graviton instances and auto scaling. The core idea is straightforward: instead of leaving execution placement entirely to the default Lambda fleet, we deliberately provide a managed compute pool that multiple functions can share. That gives us more control over cost-performance characteristics and lets us design around cold-start pain rather than merely complain about it.&lt;/p&gt;

&lt;p&gt;The choice is visible in the Terraform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The provider runs on &lt;code&gt;arm64&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Allowed instance types are constrained to &lt;code&gt;m6g.large&lt;/code&gt;, &lt;code&gt;m6g.xlarge&lt;/code&gt;, &lt;code&gt;m7g.large&lt;/code&gt;, and &lt;code&gt;m7g.xlarge&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Scaling is set to &lt;code&gt;Auto&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The maximum pool ceiling is set to &lt;code&gt;64&lt;/code&gt; vCPU&lt;/li&gt;
&lt;li&gt;The capacity provider is placed in the default VPC, with unsupported Availability Zones filtered out&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point matters more than it first appears. The code explicitly excludes unsupported AZs such as &lt;code&gt;us-east-1e&lt;/code&gt;, which is a good example of operational maturity: the happy path is not enough when the service itself has placement constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  How We Actually Created the Capacity Provider
&lt;/h2&gt;

&lt;p&gt;One thing I wanted this platform to avoid was "concept architecture" with no implementation backbone. So the capacity provider here is not described abstractly. It is provisioned directly in Terraform and wired into the Lambda lifecycle in a fairly intentional way.&lt;/p&gt;

&lt;p&gt;The build starts in &lt;code&gt;terraform_file/agent_core_sync_cp.tf&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;First, the capacity provider itself is created with &lt;code&gt;aws_lambda_capacity_provider&lt;/code&gt;. The naming pattern ties it to the service and environment, which is the right instinct for multi-environment operation. The provider is tagged as shared compute for agent workloads, which matters later for discoverability and platform governance.&lt;/p&gt;

&lt;p&gt;Second, the provider is placed inside the default VPC, but not blindly. In &lt;code&gt;terraform_file/data.tf&lt;/code&gt;, the code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;discovers the default VPC&lt;/li&gt;
&lt;li&gt;fetches the default subnets&lt;/li&gt;
&lt;li&gt;inspects subnet Availability Zones one by one&lt;/li&gt;
&lt;li&gt;excludes unsupported zones such as &lt;code&gt;us-east-1e&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;optionally caps how many subnets are used&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a subtle but important design choice. Lambda Managed Instances often create one placement footprint per subnet or AZ. If you do not control subnet spread, you can end up creating more infrastructure surface area than you intended.&lt;/p&gt;

&lt;p&gt;Third, the provider uses a dedicated security group rather than inheriting something vague and accidental. The current implementation keeps outbound traffic fully open and allows inbound HTTPS. That is permissive, but it is at least explicit and repeatable. Early-stage platforms benefit from that kind of clarity.&lt;/p&gt;

&lt;p&gt;Fourth, the capacity provider gets its own operator role through &lt;code&gt;AWSLambdaManagedEC2ResourceOperator&lt;/code&gt;. That is a critical detail. Capacity providers are not just Lambda resources; they need AWS to manage the EC2-backed execution infrastructure on your behalf. If you miss that role, the platform does not really exist no matter how nice your Terraform looks.&lt;/p&gt;

&lt;p&gt;Fifth, the instance requirements are opinionated. The code forces &lt;code&gt;arm64&lt;/code&gt; and narrows the fleet to supported Graviton M-family instance types. That is one of the better engineering decisions in this implementation because it converts an architectural preference into an enforceable runtime rule.&lt;/p&gt;

&lt;p&gt;Finally, the Lambda function is attached to the capacity provider in &lt;code&gt;terraform_file/lambda_clm_router_agent.tf&lt;/code&gt; through &lt;code&gt;capacity_provider_config&lt;/code&gt;. That is where the abstraction becomes real. We are not just provisioning a pool and hoping someone uses it later. We are explicitly binding a published Lambda to that pool and tuning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;memory GiB per vCPU&lt;/li&gt;
&lt;li&gt;max concurrency per execution environment&lt;/li&gt;
&lt;li&gt;ARM64 runtime alignment&lt;/li&gt;
&lt;li&gt;published versioning through Lambda aliases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the full loop: provision shared compute, constrain placement, grant AWS the operator role it needs, attach live functions to the pool, and then manage the resulting version sprawl with automation. That is what makes this feel like a platform artifact rather than a loose Terraform experiment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 1: A Capacity Provider Is Not a Tuning Knob. It Is an Operating Model.
&lt;/h2&gt;

&lt;p&gt;Teams often talk about capacity providers as if they are just a performance optimization. That framing is too shallow.&lt;/p&gt;

&lt;p&gt;The moment you move Lambda onto managed instances, you are no longer only buying faster startup. You are adopting a new operating model with very clear implications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You now care about instance family compatibility&lt;/li&gt;
&lt;li&gt;You need to think about subnet strategy and AZ support&lt;/li&gt;
&lt;li&gt;You have to reason about pool scaling ceilings, concurrency, and memory per vCPU&lt;/li&gt;
&lt;li&gt;You are effectively blending serverless ergonomics with infrastructure accountability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This implementation shows that transition clearly. The CLM router Lambda is not just declared with a runtime and handler. It is attached to the shared capacity provider and explicitly tuned with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;execution_environment_memory_gib_per_vcpu&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;per_execution_environment_max_concurrency&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;publish = true&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;architectures = ["arm64"]&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the tell. Once we start specifying how execution environments should behave, we are no longer simply "deploying a Lambda." We are shaping compute economics.&lt;/p&gt;

&lt;p&gt;The practical lesson here is simple: if you adopt Lambda Managed Instances, treat it like platform engineering, not like a runtime checkbox.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 2: ARM64 Delivers Real Value, but Only if You Respect Service Constraints
&lt;/h2&gt;

&lt;p&gt;One of the strongest decisions in this implementation is the bias toward Graviton. For Python-heavy agent workloads, ARM64 is usually the right default. The economics are better, and the performance-per-dollar story is often compelling.&lt;/p&gt;

&lt;p&gt;But there is an important nuance that the Terraform comments correctly capture: not every EC2 family you might expect is supported in the way you assume. This implementation explicitly avoids unsupported combinations and narrows the fleet to supported M-family Graviton instances.&lt;/p&gt;

&lt;p&gt;That is a good lesson in cloud architecture generally: cloud products market flexibility, but production systems survive on constraint management.&lt;/p&gt;

&lt;p&gt;The teams that do well with modern AWS services are not the ones that assume every SKU works. They are the ones that encode the service's real boundaries in Terraform so no one has to rediscover them during an incident window.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 3: Warmup Is Not a Hack. It Is a Deliberate Control Loop.
&lt;/h2&gt;

&lt;p&gt;There is a tendency in engineering circles to treat "warming" as a slightly embarrassing workaround. I think that is the wrong mindset.&lt;/p&gt;

&lt;p&gt;This implementation schedules the CLM router Lambda every five minutes through EventBridge. The handler itself is intentionally lightweight and effectively acts as a keep-alive mechanism. That is not laziness. It is an explicit decision to keep the shared pool alive for latency-sensitive traffic.&lt;/p&gt;

&lt;p&gt;More specifically, the warmer exists to reduce the probability that the capacity provider has to spin up fresh managed instance capacity for a new invocation path after a quiet period. That is the practical point of the EventBridge rule in &lt;code&gt;terraform_file/eventbridge_cp_arm.tf&lt;/code&gt;. By invoking the Lambda on a steady &lt;code&gt;rate(5 minutes)&lt;/code&gt; schedule, the platform keeps the execution path warm enough that the shared capacity provider is less likely to fall all the way back to a cold, scale-from-zero posture right before a real request arrives.&lt;/p&gt;

&lt;p&gt;The important insight is this: once you care about cold-start predictability, you need a control loop.&lt;/p&gt;

&lt;p&gt;That control loop can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provisioned concurrency&lt;/li&gt;
&lt;li&gt;Scheduled warmers&lt;/li&gt;
&lt;li&gt;Request shaping&lt;/li&gt;
&lt;li&gt;A shared managed instance pool&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this design, the team chose scheduled warm invocation plus a shared capacity provider. That is a sensible middle ground. It is cheaper and simpler than overcommitting always-on infrastructure, while still materially reducing the first-hit penalty.&lt;/p&gt;

&lt;p&gt;In plain English: the EventBridge warmer is being used here so the capacity provider does not need to spin up a brand-new server footprint every time traffic reappears after idle time. For interactive or latency-sensitive agent workloads, that is a very practical optimization.&lt;/p&gt;

&lt;p&gt;The strategic lesson is that warmup should be measured against business latency, not ideological purity. If a five-minute EventBridge schedule protects user experience and keeps cost acceptable, it is doing its job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 4: Shared Pools Create Efficiency, but They Also Create Coupling
&lt;/h2&gt;

&lt;p&gt;The capacity provider here is intentionally shared across platform agents and automation services. That is the right move early in a platform journey because it improves utilization and prevents every Lambda from inventing its own isolated infrastructure story.&lt;/p&gt;

&lt;p&gt;But shared pools always introduce two forms of coupling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Technical coupling, because multiple workloads compete for the same execution substrate&lt;/li&gt;
&lt;li&gt;Organizational coupling, because one team's deployment patterns can affect another team's cost and performance envelope&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why the concurrency controls here matter. The CLM router function uses a per-execution-environment concurrency setting, and the environment-specific &lt;code&gt;.tfvars&lt;/code&gt; files pin that concurrency to &lt;code&gt;4&lt;/code&gt;. That is more than a performance number. It is a fairness policy.&lt;/p&gt;

&lt;p&gt;If I were advising a platform team scaling this pattern, I would say this clearly: shared capacity providers are excellent, but they need quota thinking from day one. Otherwise the first successful workload becomes the first noisy neighbor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 5: If You Publish Versions Aggressively, You Need Lifecycle Hygiene on Day One
&lt;/h2&gt;

&lt;p&gt;This implementation makes another good call: the Lambda functions are published, aliased, and then cleaned up with an automated version pruner.&lt;/p&gt;

&lt;p&gt;That matters because version sprawl is one of those quiet operational problems that teams ignore until it becomes annoying enough to disrupt deployments. Published versions accumulate quickly when CI/CD is active. If you do not manage them, you eventually pay in clutter, confusion, or hard service limits.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;lambda_version_pruner&lt;/code&gt; implementation is stronger than a simplistic cleanup script because it preserves what actually matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It scans all Lambda functions&lt;/li&gt;
&lt;li&gt;It filters only functions associated with the target capacity provider&lt;/li&gt;
&lt;li&gt;It lists all aliases and protects aliased versions&lt;/li&gt;
&lt;li&gt;It keeps the latest N published versions&lt;/li&gt;
&lt;li&gt;It deletes everything older that is neither current nor aliased&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly the kind of automation mature teams invest in. Not glamorous. Very valuable.&lt;/p&gt;

&lt;p&gt;There is also an understated platform principle here: rollback is not just about keeping artifacts. It is about keeping the right artifacts. By preserving aliased versions, the pruner respects deployment intent rather than blindly optimizing for tidiness.&lt;/p&gt;

&lt;p&gt;There is also a more practical capacity-provider reason for doing this, and it deserves to be stated directly.&lt;/p&gt;

&lt;p&gt;When you run a shared Lambda Managed Instances pool, you want the platform to spend its effort on the versions that are actually serving traffic, warming correctly, or remaining available for safe rollback. If old published versions keep accumulating forever, three unhealthy things tend to happen:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;operators lose clarity on which versions are still meaningful&lt;/li&gt;
&lt;li&gt;rollback and alias management become noisier than they should be&lt;/li&gt;
&lt;li&gt;the shared platform carries more deployment residue than useful runtime intent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Strictly speaking, deleting old Lambda versions does not magically increase CPU on the capacity provider. What it does do is improve platform hygiene around the shared pool. It ensures that the versions attached to aliases, warmup patterns, and deployment workflows remain deliberate and limited. In other words, it improves capacity-provider utilization indirectly by reducing version sprawl around the workloads that consume that shared capacity.&lt;/p&gt;

&lt;p&gt;That matters in real operations. The healthier the deployment surface is, the easier it is to reason about what is warming, what is active, what can be rolled back, and what should no longer influence the platform at all.&lt;/p&gt;

&lt;p&gt;So the version pruner is not just a cleanup utility. It is part of making the shared capacity provider operationally efficient. Not by adding raw compute, but by reducing noise, protecting the versions that matter, and keeping the platform focused on live execution paths instead of historical leftovers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 6: GitHub Actions Should Orchestrate. CodeBuild Should Execute.
&lt;/h2&gt;

&lt;p&gt;Architecturally, the CI/CD model here is sensible.&lt;/p&gt;

&lt;p&gt;GitHub Actions is used as the control plane:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;branch-based triggering&lt;/li&gt;
&lt;li&gt;security scanning&lt;/li&gt;
&lt;li&gt;environment selection&lt;/li&gt;
&lt;li&gt;AWS credential injection&lt;/li&gt;
&lt;li&gt;build orchestration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS CodeBuild is used as the execution plane:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Terraform install&lt;/li&gt;
&lt;li&gt;&lt;code&gt;terraform init&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;terraform validate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;terraform plan&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;terraform apply&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I like this split. It keeps GitHub Actions lightweight and makes AWS the place where the actual infrastructure mutation happens. That usually gives better access control, cleaner auditability, and fewer surprises around long-running plan or apply steps.&lt;/p&gt;

&lt;p&gt;The buildspecs pin Terraform &lt;code&gt;1.12.2&lt;/code&gt;, install the CLI explicitly, and then execute plan/apply flows with environment-specific variable files. That is exactly the kind of boring repeatability you want in infrastructure delivery.&lt;/p&gt;

&lt;p&gt;This is one of the most practical lessons from the implementation: do not force GitHub Actions to be your full deployment runtime if AWS-native execution gives you better control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 7: CI/CD Maturity Is Not About Having a Pipeline. It Is About Where the Gates Actually Are.
&lt;/h2&gt;

&lt;p&gt;The implementation also reveals a harder truth: CI/CD design is won or lost not by YAML volume, but by trigger discipline.&lt;/p&gt;

&lt;p&gt;There are some good instincts here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dev deployment is chained off a successful security workflow&lt;/li&gt;
&lt;li&gt;Security scanning runs on push and PR for &lt;code&gt;dev&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;PR security review is scoped only to actual code and infrastructure changes&lt;/li&gt;
&lt;li&gt;Environment-specific secrets are used for AWS access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That said, the current implementation also shows the kinds of issues every fast-moving team encounters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The dev deploy workflow is triggered by &lt;code&gt;Security Checks (Push)&lt;/code&gt;, not by a broader quality gate such as tests plus security plus static analysis&lt;/li&gt;
&lt;li&gt;The QA workflow is currently triggered on &lt;code&gt;pull_request&lt;/code&gt; to &lt;code&gt;qa&lt;/code&gt;, yet it also includes an apply stage, which is a risky combination&lt;/li&gt;
&lt;li&gt;The sanity workflow references a different CodeBuild project naming pattern, which looks like copy-forward drift from another implementation&lt;/li&gt;
&lt;li&gt;One dev apply step mixes generic and environment-specific secrets in a way that deserves tightening&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a criticism of the team. It is actually the most authentic part of the system.&lt;/p&gt;

&lt;p&gt;Real pipelines evolve through reuse, renaming, urgency, and partial migration. The useful engineering habit is not pretending they are pristine. It is recognizing that pipeline drift is itself a production concern.&lt;/p&gt;

&lt;p&gt;My blunt lesson here is this: CI/CD is software. It needs the same review rigor as application code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 8: Documentation Drift Is a Reliability Signal
&lt;/h2&gt;

&lt;p&gt;The README here is ambitious and useful, but parts of it clearly describe a broader or earlier architecture than the exact files currently present. That mismatch is more important than most teams realize.&lt;/p&gt;

&lt;p&gt;When documentation and implementation diverge, three things happen:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;new engineers learn the wrong system&lt;/li&gt;
&lt;li&gt;reviewers approve changes with outdated mental models&lt;/li&gt;
&lt;li&gt;incidents take longer to resolve because operators trust stale diagrams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of the best engineering habits is to treat documentation drift as an operational bug, not as a cosmetic issue.&lt;/p&gt;

&lt;p&gt;This implementation makes that case well. The code is the source of truth. The docs are directionally strong, but some names, workflow descriptions, and file references have clearly moved over time. That is normal. What matters is catching it before the next engineer builds decisions on old assumptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 9: The Default VPC Is Fine for Speed, but It Should Be a Conscious Temporary Convenience
&lt;/h2&gt;

&lt;p&gt;The Terraform intentionally uses the default VPC and default subnets, then layers in filtering and a custom security group. For early velocity, that is an acceptable choice. It removes friction and makes the first deployment much easier.&lt;/p&gt;

&lt;p&gt;But teams should be honest about the tradeoff.&lt;/p&gt;

&lt;p&gt;Using the default VPC accelerates setup. It does not provide the same clarity, segmentation, or policy hygiene that a dedicated workload VPC eventually should. The inbound HTTPS rule from &lt;code&gt;0.0.0.0/0&lt;/code&gt; is another example of where a practical early-stage decision should later be revisited with a more opinionated security posture.&lt;/p&gt;

&lt;p&gt;My view is simple: default VPC usage is fine when it is a speed decision. It becomes dangerous when it silently hardens into architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 10: Least Privilege Usually Loses the First Battle. Do Not Let It Lose the War.
&lt;/h2&gt;

&lt;p&gt;The Lambda IAM policy for the router function is broad. Very broad.&lt;/p&gt;

&lt;p&gt;That is common when a platform team is trying to unblock integration work quickly across S3, SQS, SNS, DynamoDB, Bedrock, AppSync, logs, X-Ray, and secrets. The version pruner is noticeably tighter, which is encouraging. But the broader pattern remains familiar: the first version of a system usually over-grants.&lt;/p&gt;

&lt;p&gt;The lesson is not "never do that." The lesson is "know when you are doing it, and schedule the hardening work while the platform is still comprehensible."&lt;/p&gt;

&lt;p&gt;Security debt compounds. The longer a wide-open policy survives, the more invisible it becomes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Repo Gets Right
&lt;/h2&gt;

&lt;p&gt;If I strip away the drift and focus on the platform instincts, this implementation gets a lot right:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It treats capacity provider infrastructure as shared platform capability, not one-off function plumbing&lt;/li&gt;
&lt;li&gt;It optimizes for ARM64 economics instead of defaulting to x86 out of habit&lt;/li&gt;
&lt;li&gt;It acknowledges cold starts as a business problem and addresses them operationally&lt;/li&gt;
&lt;li&gt;It preserves rollback safety with aliases while still pruning version sprawl&lt;/li&gt;
&lt;li&gt;It separates orchestration from execution in CI/CD&lt;/li&gt;
&lt;li&gt;It encodes AWS service constraints in Terraform comments and defaults, which reduces tribal knowledge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a strong foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Would Improve Next
&lt;/h2&gt;

&lt;p&gt;If I were turning this into the next version of a production-grade internal platform, I would prioritize the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Tighten naming consistency across the implementation.&lt;br&gt;
The capacity provider name appears in slightly different forms across resources. That is how automation misses its target. Shared naming locals should eliminate this class of error.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make QA and production promotion rules stricter.&lt;br&gt;
A PR-triggered apply path should be removed. Plan on PR, apply on protected branch or approved environment gate is the cleaner model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run Terraform from a single explicit working directory.&lt;br&gt;
The current layout places Terraform under &lt;code&gt;terraform_file/&lt;/code&gt;, while some buildspec commands read like root-level execution. That ambiguity should be eliminated.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Move from broad IAM toward intent-based policies.&lt;br&gt;
Especially for the router Lambda, policy scope should narrow as the workload stabilizes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Revisit networking posture.&lt;br&gt;
The default VPC is fine for speed; a dedicated VPC model is better for longevity, auditability, and controlled ingress.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add stronger deployment quality gates.&lt;br&gt;
Security review is useful, but infrastructure promotion should also hang off validation, tests, linting, and explicit approval where appropriate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add platform observability as code.&lt;br&gt;
CloudWatch alarms, dashboarding, and cost visibility for the capacity provider should be treated as first-class Terraform resources, not follow-up tasks.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Bigger Technical Lesson
&lt;/h2&gt;

&lt;p&gt;The biggest takeaway from this implementation is not about Lambda specifically.&lt;/p&gt;

&lt;p&gt;It is about how modern platform teams should build.&lt;/p&gt;

&lt;p&gt;We should absolutely chase better cost-performance curves. We should use managed primitives aggressively. We should automate the boring work. But we also need the discipline to encode what we learn while the system is still small enough to reason about.&lt;/p&gt;

&lt;p&gt;What makes this useful is that it shows both halves of real engineering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the architectural intent&lt;/li&gt;
&lt;li&gt;the implementation scars&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination is where credible engineering judgment comes from.&lt;/p&gt;

&lt;p&gt;Anyone can present a clean target state. The harder and more useful skill is building systems that survive contact with deployment friction, service constraints, naming drift, and operational reality.&lt;/p&gt;

&lt;p&gt;That is what this implementation is doing. And that is why the lessons here matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Thought
&lt;/h2&gt;

&lt;p&gt;Capacity providers, warmers, version pruning, and GitHub-driven delivery are not separate topics. They are all answers to the same technical question:&lt;/p&gt;

&lt;p&gt;How do we make cloud systems faster, cheaper, safer, and more repeatable without turning every application team into a specialized infrastructure group?&lt;/p&gt;

&lt;p&gt;In this implementation, the answer was to centralize the hard platform decisions, automate the hygiene, keep the runtime warm where it matters, and stay honest about the places where the system still needs tightening.&lt;/p&gt;

&lt;p&gt;That is not just good infrastructure work.&lt;/p&gt;

&lt;p&gt;That is good engineering practice.&lt;/p&gt;

</description>
      <category>serverless</category>
      <category>lambda</category>
      <category>aws</category>
    </item>
    <item>
      <title>Lessons I learned building a memory-aware agent with Amazon Bedrock AgentCore Runtime</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 20 Apr 2026 18:10:16 +0000</pubDate>
      <link>https://dev.to/amitkayal/lessons-i-learned-building-a-memory-aware-agent-with-amazon-bedrock-agentcore-runtime-4lc9</link>
      <guid>https://dev.to/amitkayal/lessons-i-learned-building-a-memory-aware-agent-with-amazon-bedrock-agentcore-runtime-4lc9</guid>
      <description>&lt;h1&gt;
  
  
  Lessons I learned building a memory-aware agent with Amazon Bedrock AgentCore Runtime
&lt;/h1&gt;

&lt;p&gt;When I started building an agent with Amazon Bedrock AgentCore Runtime, I thought the difficult parts would be model selection, tool wiring, and deployment. Those certainly mattered, but the part that shaped the quality of the agent most was memory.&lt;/p&gt;

&lt;p&gt;The first version of the agent could answer single prompts well enough, but it did not behave like a real multi-turn system. Follow-up questions were brittle. The agent lost short-range intent. Tool usage worked, but only within the narrow boundaries of the current prompt. As soon as the conversation depended on what happened one or two turns earlier, the system started to feel less like an agent and more like a stateless inference endpoint.&lt;/p&gt;

&lt;p&gt;That experience changed how I approached the design. I stopped thinking about memory as a convenience feature and started treating it as part of the runtime architecture itself. This article is a distillation of the most important lessons I learned while building a short-term-memory-aware agent with Amazon Bedrock AgentCore Runtime and Strands.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 1: An agent is not really multi-turn until memory is part of the lifecycle
&lt;/h2&gt;

&lt;p&gt;One of the first things I learned is that conversational continuity does not emerge automatically just because the application calls the same runtime repeatedly.&lt;/p&gt;

&lt;p&gt;Without short-term memory, the agent only sees the current prompt unless the application keeps reconstructing and replaying history manually. That creates several problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;previous instructions are easy to lose,&lt;/li&gt;
&lt;li&gt;tool chains become fragile across turns,&lt;/li&gt;
&lt;li&gt;users have to restate identifiers and intent,&lt;/li&gt;
&lt;li&gt;the system becomes increasingly prompt-shaped rather than interaction-shaped.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What became clear to me is that short-term memory is not about storing everything forever. It is about preserving enough recent state for the current conversation to remain coherent.&lt;/p&gt;

&lt;p&gt;That distinction matters. I was not trying to build a knowledge base or semantic fact store. I was trying to answer a simpler question: how do I help the agent remember what we were just doing?&lt;/p&gt;

&lt;p&gt;Once I framed the problem that way, the architecture became much clearer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 2: The cleanest pattern is explicit memory, not implicit transcript magic
&lt;/h2&gt;

&lt;p&gt;Another lesson I learned quickly is that I did not want memory to be hidden behind vague runtime behavior. I wanted the agent code to make memory use explicit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;where memory comes from,&lt;/li&gt;
&lt;li&gt;when it is read,&lt;/li&gt;
&lt;li&gt;when it is written,&lt;/li&gt;
&lt;li&gt;which user it belongs to,&lt;/li&gt;
&lt;li&gt;which conversation it belongs to.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That led me to a pattern built around &lt;code&gt;MemoryClient&lt;/code&gt; and hooks.&lt;/p&gt;

&lt;p&gt;Instead of treating memory like a passive transcript that somehow appears at the edge of the request, I found it much more reliable to think about it as a lifecycle-managed dependency:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;create a short-term memory resource,&lt;/li&gt;
&lt;li&gt;pass the memory identity into the runtime,&lt;/li&gt;
&lt;li&gt;read recent turns when the agent initializes,&lt;/li&gt;
&lt;li&gt;write new messages as events when the conversation changes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The important shift for me was this: memory worked best when it was part of the agent object model, not just part of request handling glue code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 3: Hooks are where memory belongs
&lt;/h2&gt;

&lt;p&gt;This was probably the biggest implementation insight.&lt;/p&gt;

&lt;p&gt;Once I had a Strands-based agent running inside AgentCore Runtime, I needed to decide where the memory logic should live. I could have put everything directly into the entrypoint and manually stitched together request parsing, history retrieval, message persistence, and prompt injection. That would have worked, but it would have made the agent lifecycle harder to reason about.&lt;/p&gt;

&lt;p&gt;What worked better was using hooks tied to the agent lifecycle itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;AgentInitializedEvent&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;MessageAddedEvent&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That structure gave me a much cleaner mental model.&lt;/p&gt;

&lt;p&gt;On initialization, the agent needs context before it reasons. That is the right moment to retrieve the most recent turns from memory and inject them into prompt context.&lt;/p&gt;

&lt;p&gt;When a new message is added, the conversation state has changed. That is the right moment to persist the latest user or assistant message back into memory.&lt;/p&gt;

&lt;p&gt;The core interaction looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;recent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_last_k_turns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What I like about this model is that it is deterministic.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;memory load happens before reasoning,&lt;/li&gt;
&lt;li&gt;memory write happens when conversation state changes,&lt;/li&gt;
&lt;li&gt;both operations use the same identity boundaries,&lt;/li&gt;
&lt;li&gt;the entrypoint stays focused on request extraction rather than conversation orchestration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That made the system easier to debug, easier to extend, and much easier to explain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 4: Identity is the real memory boundary
&lt;/h2&gt;

&lt;p&gt;Before building this, I thought of memory mostly as a storage problem. In practice, I learned it is just as much an identity problem.&lt;/p&gt;

&lt;p&gt;The two identifiers that mattered most were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;actor_id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;session_id&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation ended up being foundational.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why &lt;code&gt;actor_id&lt;/code&gt; matters
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;actor_id&lt;/code&gt; is the user boundary. If that identifier is unstable, absent, or inconsistent, memory quality degrades immediately.&lt;/p&gt;

&lt;p&gt;What I learned is that a memory system is only as good as the application identity you feed into it. If the same user appears under multiple IDs, the agent cannot retrieve a coherent conversational history. If two users are accidentally mapped to the same identity, memory becomes unsafe.&lt;/p&gt;

&lt;p&gt;So one of my strongest takeaways is that &lt;code&gt;actor_id&lt;/code&gt; should always come from a stable authenticated user identity, not from an incidental client-generated value.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why &lt;code&gt;session_id&lt;/code&gt; matters
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;session_id&lt;/code&gt; turned out to be just as important. A single user does not have just one conversation. They may have multiple active threads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one troubleshooting flow,&lt;/li&gt;
&lt;li&gt;one transcript analysis request,&lt;/li&gt;
&lt;li&gt;one abandoned conversation from earlier,&lt;/li&gt;
&lt;li&gt;one brand-new task.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without a session boundary, all of that collapses into one memory stream. The agent might technically “remember,” but it remembers too much of the wrong thing.&lt;/p&gt;

&lt;p&gt;That was a key lesson for me: useful memory is not just preserved memory. It is correctly scoped memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 5: The agent should be rebuilt per request, but memory should persist across requests
&lt;/h2&gt;

&lt;p&gt;This was an architectural point that became clearer as I implemented the runtime flow.&lt;/p&gt;

&lt;p&gt;The Strands agent instance itself is created per request. That makes sense because each invocation carries request-specific state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the current user prompt,&lt;/li&gt;
&lt;li&gt;the active user identity,&lt;/li&gt;
&lt;li&gt;the active conversation session,&lt;/li&gt;
&lt;li&gt;the active tool and runtime context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But memory should not behave like request-local state. Memory has to outlive the agent instance and remain keyed to the same user and conversation across invocations.&lt;/p&gt;

&lt;p&gt;That split was important for me to internalize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;agent instance lifecycle is short,&lt;/li&gt;
&lt;li&gt;conversation memory lifecycle is longer,&lt;/li&gt;
&lt;li&gt;the link between them is established through state and hooks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once I started thinking in those terms, the design felt much more natural.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 6: Deployment is part of the memory design
&lt;/h2&gt;

&lt;p&gt;I originally thought of deployment as a separate concern from conversational behavior. Building this agent convinced me that the two are tightly connected.&lt;/p&gt;

&lt;p&gt;The runtime needs to know which memory resource it should use, but I did not want that decision hardcoded in application logic. The better pattern was to resolve the correct memory resource during deployment and pass that identity into the runtime as configuration.&lt;/p&gt;

&lt;p&gt;In practice, that meant the runtime received environment-specific values such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AGENT_NAME=&amp;lt;agent-name&amp;gt;
MEMORY_ID=&amp;lt;memory-id&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gave me a few benefits immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the same application code could move across environments,&lt;/li&gt;
&lt;li&gt;memory resources stayed aligned with environment boundaries,&lt;/li&gt;
&lt;li&gt;the runtime remained configurable without source changes,&lt;/li&gt;
&lt;li&gt;the control plane remained the primary place where resource binding happened.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of the clearest lessons here is that memory should be treated like any other environment-bound infrastructure dependency. If it is not part of deployment, it tends to become a hidden assumption.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 7: Short-term memory and long-term memory solve different problems
&lt;/h2&gt;

&lt;p&gt;I found it helpful to stop using the word “memory” as if it meant one thing.&lt;/p&gt;

&lt;p&gt;Short-term memory answered the question:&lt;/p&gt;

&lt;p&gt;"What was happening in this conversation recently?"&lt;/p&gt;

&lt;p&gt;Long-term memory answers a different question:&lt;/p&gt;

&lt;p&gt;"What durable information should the system remember beyond this immediate interaction?"&lt;/p&gt;

&lt;p&gt;For the agent I was building, the short-term problem came first. I needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;recent-turn continuity,&lt;/li&gt;
&lt;li&gt;bounded replay,&lt;/li&gt;
&lt;li&gt;session-scoped context,&lt;/li&gt;
&lt;li&gt;predictable event retention.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I did not need semantic fact retrieval in the first phase. I did not need vector search for historical knowledge. I needed the agent to remain coherent across adjacent turns.&lt;/p&gt;

&lt;p&gt;That was an important design simplification. It kept the first version of the memory architecture focused on event continuity instead of overextending into knowledge retrieval prematurely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 8: Recent-turn replay should be bounded
&lt;/h2&gt;

&lt;p&gt;Once I had memory retrieval working, the next question was how much of it to inject back into the agent context.&lt;/p&gt;

&lt;p&gt;My lesson here was simple: more memory is not always better memory.&lt;/p&gt;

&lt;p&gt;If too much prior conversation is replayed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt size grows,&lt;/li&gt;
&lt;li&gt;token cost grows,&lt;/li&gt;
&lt;li&gt;stale context starts competing with the current task,&lt;/li&gt;
&lt;li&gt;reasoning quality can actually decline.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I found the most practical pattern was to retrieve the last few turns and inject them into prompt context in a compact representation. In this design, that replay window was bounded at five turns.&lt;/p&gt;

&lt;p&gt;That gave me a good balance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;enough recent context for continuity,&lt;/li&gt;
&lt;li&gt;small enough context for predictable prompt growth,&lt;/li&gt;
&lt;li&gt;simple enough formatting to inspect and debug.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This also reinforced another lesson: short-term memory should be operationally understandable. I want to know what context the model saw, not just trust that some opaque memory layer handled it correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 9: Memory becomes more valuable when tools are involved
&lt;/h2&gt;

&lt;p&gt;The agent I built was not just a conversational shell. It had tools, including domain-specific behavior such as transcript retrieval and AWS interactions.&lt;/p&gt;

&lt;p&gt;That is where the value of short-term memory became even more obvious.&lt;/p&gt;

&lt;p&gt;In a tool-using workflow, the user often does not repeat the full context every turn. They say things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"use the same meeting"&lt;/li&gt;
&lt;li&gt;"what did the second speaker say?"&lt;/li&gt;
&lt;li&gt;"now summarize that"&lt;/li&gt;
&lt;li&gt;"check the S3 output from before"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without memory, the agent has to reconstruct working state from a single prompt. With memory, the agent has a much better chance of preserving:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the active object under discussion,&lt;/li&gt;
&lt;li&gt;the prior user instruction,&lt;/li&gt;
&lt;li&gt;the last tool result,&lt;/li&gt;
&lt;li&gt;the intended next step.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of my strongest takeaways is that memory is not just a conversational improvement. It is a workflow improvement. It makes tool orchestration across turns materially more coherent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 10: Failure modes need to be designed, not discovered in production
&lt;/h2&gt;

&lt;p&gt;Building this also made me think much more carefully about degraded behavior.&lt;/p&gt;

&lt;p&gt;If memory resolution fails and the runtime cannot find a memory resource, the agent may still run. That sounds convenient, but it also means the system may silently shift from stateful to stateless behavior.&lt;/p&gt;

&lt;p&gt;That taught me to treat the following as first-class operational conditions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;memory enabled,&lt;/li&gt;
&lt;li&gt;memory disabled,&lt;/li&gt;
&lt;li&gt;memory load succeeded,&lt;/li&gt;
&lt;li&gt;memory write succeeded,&lt;/li&gt;
&lt;li&gt;memory resolution failed,&lt;/li&gt;
&lt;li&gt;identity inputs were missing or malformed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The same thing applies to identity mistakes.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;actor_id&lt;/code&gt; is unstable, memory becomes fragmented.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;session_id&lt;/code&gt; is reused incorrectly, unrelated conversations bleed into each other.&lt;/p&gt;

&lt;p&gt;If replay windows grow without discipline, prompt quality degrades.&lt;/p&gt;

&lt;p&gt;These are not edge cases. They are part of the normal operating surface of a memory-aware agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 11: Retention, privacy, and compliance show up earlier than expected
&lt;/h2&gt;

&lt;p&gt;Short-term memory sounds lightweight, but it is still stored interaction data.&lt;/p&gt;

&lt;p&gt;That means retention policy is not just a platform setting. It is part of the product design. While building this, I became much more aware that memory decisions quickly intersect with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;data handling policy,&lt;/li&gt;
&lt;li&gt;privacy expectations,&lt;/li&gt;
&lt;li&gt;deletion and retention requirements,&lt;/li&gt;
&lt;li&gt;security review,&lt;/li&gt;
&lt;li&gt;production observability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The technical implementation can be elegant, but if these operational questions are not addressed early, the design will be incomplete.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 12: AgentCore became more useful to me when I treated it as a runtime system, not just a hosting target
&lt;/h2&gt;

&lt;p&gt;This may be the broadest lesson of all.&lt;/p&gt;

&lt;p&gt;At first, I thought of AgentCore Runtime mainly as the place where the agent container would run. But while building with memory, I started appreciating it more as a runtime environment with clear operational boundaries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the runtime executes the agent,&lt;/li&gt;
&lt;li&gt;the framework manages reasoning and tools,&lt;/li&gt;
&lt;li&gt;the memory plane manages event continuity,&lt;/li&gt;
&lt;li&gt;the deployment workflow binds the right resources together.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That view helped me move beyond “deploy a model wrapper in a container” toward “operate an agent system with state, identity, and lifecycle.”&lt;/p&gt;

&lt;p&gt;For me, that was the real shift.&lt;/p&gt;

&lt;h2&gt;
  
  
  The technical pattern I would reuse
&lt;/h2&gt;

&lt;p&gt;If I were building the same class of agent again, I would reuse the same high-level pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a dedicated short-term memory resource.&lt;/li&gt;
&lt;li&gt;Resolve the correct memory resource during deployment.&lt;/li&gt;
&lt;li&gt;Pass memory identity into the runtime explicitly.&lt;/li&gt;
&lt;li&gt;Build the agent per request with user and session state.&lt;/li&gt;
&lt;li&gt;Load recent turns during agent initialization.&lt;/li&gt;
&lt;li&gt;Persist new messages when they are added.&lt;/li&gt;
&lt;li&gt;Keep replay windows bounded.&lt;/li&gt;
&lt;li&gt;Treat &lt;code&gt;actor_id&lt;/code&gt; and &lt;code&gt;session_id&lt;/code&gt; as core correctness boundaries.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I would also keep the same mental model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;short-term memory is for continuity,&lt;/li&gt;
&lt;li&gt;long-term memory is for durable recall,&lt;/li&gt;
&lt;li&gt;hooks are the right place for memory orchestration,&lt;/li&gt;
&lt;li&gt;deployment is part of memory architecture,&lt;/li&gt;
&lt;li&gt;observability should make degraded memory behavior visible.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;The biggest lesson I learned while building with Amazon Bedrock AgentCore Runtime is that memory is not something you sprinkle onto an agent once the rest of the system works. Memory changes the shape of the system.&lt;/p&gt;

&lt;p&gt;It affects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;request lifecycle,&lt;/li&gt;
&lt;li&gt;identity boundaries,&lt;/li&gt;
&lt;li&gt;prompt construction,&lt;/li&gt;
&lt;li&gt;deployment,&lt;/li&gt;
&lt;li&gt;observability,&lt;/li&gt;
&lt;li&gt;privacy,&lt;/li&gt;
&lt;li&gt;and tool coherence across turns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once I accepted that, the architecture became much more disciplined. The agent became easier to reason about, easier to operate, and much more capable in real multi-turn interactions.&lt;/p&gt;

&lt;p&gt;That is the lesson I would carry into any future AgentCore build: if the experience is meant to feel conversational, memory has to be designed as a first-class runtime concern from the beginning.&lt;/p&gt;

</description>
      <category>agentcore</category>
      <category>aws</category>
      <category>serverless</category>
      <category>agentskills</category>
    </item>
    <item>
      <title>API Gateway as Websocket</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Tue, 21 Jan 2025 07:49:42 +0000</pubDate>
      <link>https://dev.to/amitkayal/api-gateway-as-websocket-5eee</link>
      <guid>https://dev.to/amitkayal/api-gateway-as-websocket-5eee</guid>
      <description>&lt;h1&gt;
  
  
  API Gateway as websocket
&lt;/h1&gt;

&lt;h2&gt;
  
  
  API Gateway as WS Components
&lt;/h2&gt;

&lt;p&gt;Websocket provides bidirectional session aware communication between caller and receiver and a crucial component for realtime application.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Setup API Gateway for WebSocket&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a WebSocket API in the Amazon API Gateway console or through IAC.&lt;/li&gt;
&lt;li&gt;Define the WebSocket API route selection expression. Routes here are simply like a bridge to connections e.g., 

&lt;ul&gt;
&lt;li&gt;$request.body.action.&lt;/li&gt;
&lt;li&gt;Define the following WebSocket routes:&lt;/li&gt;
&lt;li&gt;$connect: Triggered when a client establishes a connection.&lt;/li&gt;
&lt;li&gt;$disconnect: Triggered when a client disconnects.&lt;/li&gt;
&lt;li&gt;Custom routes, e.g., sendMessage, to handle specific actions.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Create an Integration with AWS Lambda&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For each route ($connect, $disconnect, custom routes), integrate a Lambda function to handle the respective logic.&lt;/li&gt;
&lt;li&gt;Use the Lambda function's handler to process:

&lt;ul&gt;
&lt;li&gt;$connect: Store the connection in DynamoDB.&lt;/li&gt;
&lt;li&gt;$disconnect: Remove the connection from DynamoDB.&lt;/li&gt;
&lt;li&gt;Custom routes: Process the message and forward it to SQS.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;DynamoDB for Connection Management&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a DynamoDB table to store:

&lt;ul&gt;
&lt;li&gt;Connection ID (Primary Key).&lt;/li&gt;
&lt;li&gt;Session ID or other metadata for grouping connections.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;This table allows tracking active WebSocket connections for broadcasting messages.&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Configure SQS for Message Queue&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use an SQS FIFO queue for guaranteed order and deduplication.&lt;/li&gt;
&lt;li&gt;Messages processed in Lambda (custom routes) are sent to SQS for downstream services.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;IAM Roles and Permissions&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Assign an IAM role to the API Gateway to invoke the integrated Lambda functions.&lt;/li&gt;
&lt;li&gt;Grant Lambda permissions to read/write from DynamoDB and send messages to SQS.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Client Connection and Messaging&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use WebSocket-compatible libraries (e.g., ws in Node.js or WebSocket API in browsers) to:&lt;/li&gt;
&lt;li&gt;Establish a WebSocket connection to the API Gateway endpoint.&lt;/li&gt;
&lt;li&gt;Send and receive messages using the WebSocket protocol.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture of Websocket mechanism
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;WebSocket Client:

&lt;ul&gt;
&lt;li&gt;Initiates WebSocket connection and communicates via send() and onmessage().&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;API Gateway (WebSocket API):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manages WebSocket connections and invokes Lambda functions for defined routes.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Route Integration (Lambda Functions):&lt;br&gt;
Every route should have an integration. There are 3 types — Mock, HTTP and Lambda.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$connect: Adds connection metadata to DynamoDB.&lt;/li&gt;
&lt;li&gt;$disconnect: Removes connection metadata from DynamoDB.&lt;/li&gt;
&lt;li&gt;$default route: selected when route cant be evaluated against message&lt;/li&gt;
&lt;li&gt;Custom Routes: Processes messages to invoke integration based on message content and forwards them to SQS.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;DynamoDB:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maintains active connection records, including connectionId and associated metadata.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;SQS FIFO Queue:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Queues messages for downstream processing, ensuring delivery order and deduplication.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Downstream Services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Processes messages from SQS and performs actions like notifications, data updates, or storage.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Security
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Authentication and Authorization
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Custom Authorizer (Lambda Authorizer)&lt;br&gt;
It can only be used for the $connect route.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a Lambda Authorizer to validate custom tokens or headers sent during connection attempts.&lt;/li&gt;
&lt;li&gt;Example:

&lt;ul&gt;
&lt;li&gt;Validate a JWT token from an identity provider (e.g., Cognito, Auth0).&lt;/li&gt;
&lt;li&gt;Check the token against allowed users or roles.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Amazon Cognito:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use Amazon Cognito for user authentication.&lt;/li&gt;
&lt;li&gt;Configure API Gateway to use Cognito to validate tokens in connection requests.&lt;/li&gt;
&lt;li&gt;Best suited for applications with user pools.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Secure WebSocket Connections
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Always use the secure WebSocket protocol (wss://). API Gateway enforces HTTPS/TLS, ensuring encrypted communication.&lt;/li&gt;
&lt;li&gt;Associate a custom domain with API Gateway WebSocket endpoint. We should AWS Certificate Manager (ACM) to manage SSL/TLS certificates.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  IP Whitelisting and Blacklisting
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; IP Whitelisting and Blacklisting: We should Attach AWS WAF to API Gateway and Block/allow requests based on IP addresses or CIDR ranges. we should also use rate limit to protect from DDoS attack
### API Gateway Throttling&lt;/li&gt;
&lt;li&gt;We can Set rate and burst limits on API Gateway routes to limit the number of connections per client.&lt;/li&gt;
&lt;li&gt;We can create API keys and associate them with usage plan and then we Limit the number of allowed requests per API key&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Environment-based Access Control:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;We should always use distinct stages (e.g., dev, prod) and restrict connections to the production API through IP rules.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tools to test
&lt;/h2&gt;

&lt;p&gt;There are following tools which we can explore to test websocket.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Piesocket&lt;/li&gt;
&lt;li&gt;Postman&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>apigateway</category>
      <category>api</category>
    </item>
    <item>
      <title>S3 table &amp; S3 Metadata table</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 09 Dec 2024 18:26:23 +0000</pubDate>
      <link>https://dev.to/aws-builders/s3-table-s3-metadata-table-91i</link>
      <guid>https://dev.to/aws-builders/s3-table-s3-metadata-table-91i</guid>
      <description>&lt;h2&gt;
  
  
  Open table format and its architecture
&lt;/h2&gt;

&lt;p&gt;OpenTable formats, such as Apache Iceberg, Apache Hudi, and Delta Lake, have gained popularity in the data analytics mainly because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ACID Transactions: OpenTable formats (e.g., Apache Iceberg, Delta Lake) ensure reliable and consistent data updates, even with concurrent access.&lt;/li&gt;
&lt;li&gt;Schema Evolution: They allow seamless updates to schemas without disrupting existing pipelines, simplifying data management. metadata tracks the changes to the dataset. The files held in the Data layer are captured by the metadata files held in the Metadata layer. As the files change, the metadata files attached to them track these changes.&lt;/li&gt;
&lt;li&gt;Optimized Queries: Partitioning and indexing enable faster queries by scanning only relevant data, improving performance and cost-efficiency.&lt;/li&gt;
&lt;li&gt;Time Travel: Users can access historical versions of data for debugging, compliance, or analytics.&lt;/li&gt;
&lt;li&gt;Interoperability: These formats integrate seamlessly with big data tools like Spark, Flink, and Presto, making them versatile and widely adopted.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Open file format
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkl9mm5r6t0aqp4uy7dqa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkl9mm5r6t0aqp4uy7dqa.png" alt="img" width="750" height="588"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  S3 table
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;p&gt;Amazon S3 Table is optimized for analytics workloads. It is designed to continuously enhance query performance and reduce storage costs for tabular data. This solution looks very promising if you are working with LakeHouse architecture. It’s a new type of bucket that organizes tables as sub-resources.&lt;br&gt;
&lt;strong&gt;A new bucket type s3 table has been introduced to support this. As liked any other aws resoyrce, it has ARN, can take resource policy and as an unique feature it has dedicated endpoint.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;S3 Tables are intended explicitly for storing data in a tabular format, such as daily purchase transactions, streaming sensor data, or ad impressions. This data is organized into columns and rows like a database table.&lt;/li&gt;
&lt;li&gt;Table buckets support storing tables in the Apache Iceberg format. You can query these tables using standard SQL in query engines that support Iceberg.&lt;/li&gt;
&lt;li&gt;Read/write allowed on datafiles and metadata files. Delete and update not allowed to save data integrity.&lt;/li&gt;
&lt;li&gt;Compatible query engines include Amazon Athena, Amazon Redshift, and Apache Spark.&lt;/li&gt;
&lt;li&gt;S3 Table automatically performs maintenance tasks like compaction and snapshot management to optimize your tables for querying, including removing unreferenced files.&lt;/li&gt;
&lt;li&gt;S3 Table offers access management for both table and bucket&lt;/li&gt;
&lt;li&gt;Fully managed apache icebarg tables in S3&lt;/li&gt;
&lt;li&gt;It supports automatic compaction of underlying files to improve query performance and tune then further for better latency.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  S3 Table buckets namespace
&lt;/h3&gt;

&lt;p&gt;Namespace logically groups related s3 table together and thus allowing us to have greater control based on namespace of s3 tables. It helps us for following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;logical segmentation of data and multi tenancy

&lt;ul&gt;
&lt;li&gt;supporting of multi tenancy by having separate namespace. Supports compliance with data isolation requirements in regulated industries.&lt;/li&gt;
&lt;li&gt;separate tables based on application, project etc&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;prevent naming conflicts

&lt;ul&gt;
&lt;li&gt;Each namespace acts like a "container," allowing tables with the same name in different namespaces without conflicts.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Better Access Control

&lt;ul&gt;
&lt;li&gt;Policies can grant or restrict access to specific namespaces, ensuring data security and compliance.  It also reduces the risk of unauthorized access to unrelated tables in the same bucket.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Easy data management

&lt;ul&gt;
&lt;li&gt;Makes our life easier to query, update, or delete related tables in bulk.&lt;/li&gt;
&lt;li&gt;Makes easy metadata management for tables grouped under a namespace.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Advanced workflows based on namespace

&lt;ul&gt;
&lt;li&gt;It helps to simplify automation for data pipelines or real-time analytics applications.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  S3 table opertaion &amp;amp; management
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Table Operation&lt;/strong&gt;&lt;br&gt;
They are quite similar to CRUD operation.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;list tables&lt;/li&gt;
&lt;li&gt;create tables&lt;/li&gt;
&lt;li&gt;Get table metadata location&lt;/li&gt;
&lt;li&gt;Update table metadata location&lt;/li&gt;
&lt;li&gt;Delete Table&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Table Management&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Put Table Policy&lt;/li&gt;
&lt;li&gt;Put Table Bucket Policy&lt;/li&gt;
&lt;li&gt;Put Table Maintenance Config&lt;/li&gt;
&lt;li&gt;Put Table Bucket Maintenance Config&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Policies related to S3 table operation
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Allow access to create and use table buckets
&lt;/h3&gt;

&lt;p&gt;Here Action Lists the specific actions the policy allows. &lt;/p&gt;

&lt;p&gt;These actions are S3 Tables-specific: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;s3tables:CreateTableBucket: Grants permission to create a table bucket in S3 Tables. &lt;/li&gt;
&lt;li&gt;s3tables:PutTableBucketPolicy: Allows setting or updating the bucket policy for a table bucket. &lt;/li&gt;
&lt;li&gt;s3tables:GetTableBucketPolicy: Allows retrieving the bucket policy associated with a table bucket. &lt;/li&gt;
&lt;li&gt;s3tables:ListTableBuckets: Allows listing all table buckets within the specified scope. &lt;/li&gt;
&lt;li&gt;&lt;p&gt;s3tables:GetTableBucket: Grants permission to access the metadata of a specific table bucket.&lt;br&gt;
Resource Defines the scope of the resources these actions can apply to. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;"arn:aws:s3tables:region:account_id:bucket/*": Specifies all table buckets in the account (account_id) and region (region). &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The * after bucket/ indicates that permissions apply to all buckets under this account and region.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "Version": "2012-10-17",
    "Statement": [{
        "Sid": "AllowBucketActions for user",
        "Effect": "Allow",
        "Action": [
            "s3tables:CreateTableBucket",
            "s3tables:PutTableBucketPolicy",
            "s3tables:GetTableBucketPolicy",
            "s3tables:ListTableBuckets",
            "s3tables:GetTableBucket"
        ],
        "Resource": "arn:aws:s3tables:region:account_id:bucket/*"
    }]
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Allow access to create and use tables in a table bucket
&lt;/h3&gt;

&lt;p&gt;Here Action Lists the specific actions allowed by the policy, related to S3 Tables. &lt;em&gt;Please note that The first policy focused on creating and managing table buckets and associated metadata, but it did not include granular operations like managing tables within namespaces. The first policy did not include actions such as creating tables, querying data, or updating metadata at the table level. These are the operations where namespaces become relevant.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;s3tables:CreateTable: Allows creating new tables in the specified table bucket. &lt;/li&gt;
&lt;li&gt;s3tables:PutTableData: Grants permission to write data to tables within the table bucket. &lt;/li&gt;
&lt;li&gt;s3tables:GetTableData: Allows reading data from tables in the bucket.&lt;/li&gt;
&lt;li&gt;s3tables:GetTableMetadataLocation: Allows retrieving metadata location information for a table.&lt;/li&gt;
&lt;li&gt;s3tables:UpdateTableMetadataLocation: Grants permission to update the metadata location of a table. &lt;/li&gt;
&lt;li&gt;s3tables:GetNamespace: Allows retrieving namespace information associated with the table bucket. &lt;/li&gt;
&lt;li&gt;s3tables:CreateNamespace: Grants permission to create namespaces for organizing table data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Resource section specifies&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Grants permissions on the bucket named amzn-s3-demo-table-bucket&lt;/li&gt;
&lt;li&gt;Grants permissions on all tables within the amzn-s3-demo-table-bucket
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
     "Version": "2012-10-17",
     "Statement": [ 
         {
             "Sid": "AllowBucketActions",
             "Effect": "Allow",
             "Action": [
                 "s3tables:CreateTable",
                 "s3tables:PutTableData",
                 "s3tables:GetTableData",
                 "s3tables:GetTableMetadataLocation",
                 "s3tables:UpdateTableMetadataLocation",
                 "s3tables:GetNamespace",
                 "s3tables:CreateNamespace"
             ],

             "Resource": [
               "arn:aws:s3tables:region:account_id:bucket/amzn-s3-demo-table-bucket",
               "arn:aws:s3tables:region:account_id:bucket/amzn-s3-demo-table-bucket/table/*"
            ]
         }
     ]
 }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Table bucket policy to allows read access to the namespace
&lt;/h4&gt;

&lt;p&gt;This policy allows to read s3 tables from a namespace. Here Action Lists the specific actions allowed by the policy, related to S3 Tables. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;s3tables:GetTableData: Allows reading data from tables in the bucket.&lt;/li&gt;
&lt;li&gt;s3tables:GetTableMetadataLocation: Allows retrieving metadata location information for a table.
The resource section allows all s3 tables under bucket amzn-s3-demo-table-bucket1 but then s3tables:namespace restrict to only hr related s3 tables.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
     "Version": "2012-10-17",
     "Statement": [ 
         {
             "Effect": "Allow",
             "Action": [
             "Principal": {
               "AWS": "arn:aws:iam::123456789012:user/Jane"
             },
             "Action": [
                  "s3tables:GetTableData", 
                  "s3tables:GetTableMetadataLocation"
             ],
             "Resource":{ "arn:aws:s3tables:region:account_id:bucket/amzn-s3-demo-table-bucket1/table/*”}
             "Condition": { 
                  "StringLike": { "s3tables:namespace": "hr" } 
             }
     ]
 }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  S3 table automatic maintenance
&lt;/h2&gt;

&lt;p&gt;It provides automated maintenance through configurations that help simplify table management, optimize performance, and reduce operational overhead.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Table Lifecycle Management

&lt;ul&gt;
&lt;li&gt;we can add S3 Table configurations that includes lifecycle policies that automatically handle data expiration, transitions, or archival.&lt;/li&gt;
&lt;li&gt;automatic snapshot expiration can be configured easily.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Data Compaction

&lt;ul&gt;
&lt;li&gt;S3 Tables automatically compact small files (often produced by incremental writes) into larger, optimized files. It helps to have faster query and reduce storage cost.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Schema Evolution

&lt;ul&gt;
&lt;li&gt;Automated checks ensure compatibility between new and existing data.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Metadata Optimization

&lt;ul&gt;
&lt;li&gt;Indexing of metadata for faster querying and retrieval of table details.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All these can be policy based configuration.&lt;/p&gt;
&lt;h3&gt;
  
  
  Policy for snapshot management
&lt;/h3&gt;

&lt;p&gt;By configuring the maximumSnapshotAge, we can specify the retention period for table snapshots. The following example ensures S3 Table will automatically retain only the snapshots from the last 30 days&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MinimumSnapshots: Ensures that at least one snapshot is always retained, regardless of age. &lt;/li&gt;
&lt;li&gt;MaximumSnapshotAge: Specifies the maximum age (in hours) for snapshots to be retained.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws s3tables put-table-maintenance-configuration \
    --table-arn arn:aws:s3tables:region:account_id:bucket/bucket_name/table/table_name \
    --maintenance-configuration '{
        "SnapshotManagement": {
            "MinimumSnapshots": 1,
            "MaximumSnapshotAge": 720
        }
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  S3 Table Integration with AWS Analytics
&lt;/h2&gt;

&lt;p&gt;S3 Tables integrate seamlessly with AWS analytics services to enable querying, processing and insight generation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Athena - Run serverless SQL queries on S3 Tables&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use AWS Glue to create a Data Catalog for S3 Tables.&lt;/li&gt;
&lt;li&gt;Query data directly using SQL in Athena.&lt;/li&gt;
&lt;li&gt;Leverage table formats like Apache Iceberg or Parquet for optimized performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS Glue - Automate ETL processes for S3 Tables&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use Glue Crawlers to discover table metadata.&lt;/li&gt;
&lt;li&gt;Create ETL jobs to transform and load data into S3 Tables or other destinations.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  S3 Metadata table
&lt;/h2&gt;

&lt;p&gt;It includes system metadata including object tags and user defined metadata&lt;br&gt;
stored into s3 table&lt;br&gt;
generated in near real time during data creation so that it can be used in mins during query&lt;/p&gt;
&lt;h3&gt;
  
  
  Use case for S3 metadata table
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Real-Time Analytics

&lt;ul&gt;
&lt;li&gt;efficient query execution on metadata to identify relevant data partitions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Machine Learning Pipelines

&lt;ul&gt;
&lt;li&gt;metadata tables to filter, select, and partition data for model training.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Governance and Compliance

&lt;ul&gt;
&lt;li&gt;Track data retention and enforce lifecycle policies via metadata.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Multi-Tenant Data Applications

&lt;ul&gt;
&lt;li&gt;Use namespaces within metadata tables to logically isolate tenant data.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Data Cataloging and Discovery

&lt;ul&gt;
&lt;li&gt;Use metadata queries to identify datasets matching specific criteria.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is the sample python based function which uses metadata table query from athena.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def query_metadata_table(criteria):

    query = f"""
        SELECT *
        FROM {DATABASE}.{TABLE}
        WHERE {criteria}
    """

    print(f"Running query: {query}")

    # Start Athena query
    response = athena_client.start_query_execution(
        QueryString=query,
        QueryExecutionContext={'Database': DATABASE},
        ResultConfiguration={'OutputLocation': S3_OUTPUT}
    )

    query_execution_id = response['QueryExecutionId']

    # Wait for query completion
    print("Waiting for query to complete...")
    while True:
        status = athena_client.get_query_execution(QueryExecutionId=query_execution_id)
        state = status['QueryExecution']['Status']['State']
        if state in ['SUCCEEDED', 'FAILED', 'CANCELLED']:
            break
        time.sleep(2)

    if state != 'SUCCEEDED':
        raise Exception(f"Query failed with state: {state}")

    # Retrieve results
    results = athena_client.get_query_results(QueryExecutionId=query_execution_id)
    datasets = []
    for row in results['ResultSet']['Rows'][1:]:  # Skip the header row
        datasets.append([col['VarCharValue'] for col in row['Data']])

    print(f"Query returned {len(datasets)} datasets matching the criteria.")
    return datasets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>aws</category>
      <category>s3</category>
      <category>analytics</category>
      <category>serverless</category>
    </item>
  </channel>
</rss>
