<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Amit Kayal</title>
    <description>The latest articles on DEV Community by Amit Kayal (@amitkayal).</description>
    <link>https://dev.to/amitkayal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F500645%2Fe0c703c3-855c-4fbd-a1c0-b546a60c022e.png</url>
      <title>DEV Community: Amit Kayal</title>
      <link>https://dev.to/amitkayal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/amitkayal"/>
    <language>en</language>
    <item>
      <title>A Scaling Lesson Building Production-Grade Agentic AI Systems</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Tue, 19 May 2026 18:30:50 +0000</pubDate>
      <link>https://dev.to/amitkayal/a-scaling-lesson-building-production-grade-agentic-ai-systems-4kgp</link>
      <guid>https://dev.to/amitkayal/a-scaling-lesson-building-production-grade-agentic-ai-systems-4kgp</guid>
      <description>&lt;h1&gt;
  
  
  A Scaling Lesson Building Production-Grade Agentic AI Systems
&lt;/h1&gt;

&lt;p&gt;One of the early observations we had while designing enterprise AI agents was this:&lt;/p&gt;

&lt;p&gt;Giving an agent more tools does not necessarily make it smarter.&lt;/p&gt;

&lt;p&gt;In theory, it sounded correct.&lt;/p&gt;

&lt;p&gt;If an agent had access to customer systems, payment systems, inventory, shipping, reporting, ticketing, email, scheduling, analytics, and internal knowledge bases — it should become more powerful and autonomous.&lt;/p&gt;

&lt;p&gt;But what we observed in real implementations was very different.&lt;/p&gt;

&lt;p&gt;The more tools we added, the more unstable the system became.&lt;/p&gt;

&lt;p&gt;Not because the model was weak.&lt;/p&gt;

&lt;p&gt;Not because the tools were poorly built.&lt;/p&gt;

&lt;p&gt;But because the agent’s decision space became too large.&lt;/p&gt;

&lt;p&gt;For every user request, the agent had to evaluate all available tools, compare descriptions, infer intent, decide sequencing, and determine the best execution path.&lt;/p&gt;

&lt;p&gt;Now imagine doing this with 18 tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer lookup&lt;/li&gt;
&lt;li&gt;Order search&lt;/li&gt;
&lt;li&gt;Refund processing&lt;/li&gt;
&lt;li&gt;Inventory checking&lt;/li&gt;
&lt;li&gt;Shipping tracking&lt;/li&gt;
&lt;li&gt;Email sending&lt;/li&gt;
&lt;li&gt;Ticket creation&lt;/li&gt;
&lt;li&gt;Knowledge base search&lt;/li&gt;
&lt;li&gt;Sentiment analysis&lt;/li&gt;
&lt;li&gt;Language translation&lt;/li&gt;
&lt;li&gt;Calendar scheduling&lt;/li&gt;
&lt;li&gt;Report generation&lt;/li&gt;
&lt;li&gt;Data export&lt;/li&gt;
&lt;li&gt;User authentication&lt;/li&gt;
&lt;li&gt;Payment processing&lt;/li&gt;
&lt;li&gt;Discount application&lt;/li&gt;
&lt;li&gt;Feedback collection&lt;/li&gt;
&lt;li&gt;Escalation routing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Initially, everything looked manageable.&lt;/p&gt;

&lt;p&gt;But as workflows became more dynamic, we started observing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;wrong tool selection,&lt;/li&gt;
&lt;li&gt;unnecessary tool chaining,&lt;/li&gt;
&lt;li&gt;higher latency,&lt;/li&gt;
&lt;li&gt;increased token usage,&lt;/li&gt;
&lt;li&gt;inconsistent execution paths,&lt;/li&gt;
&lt;li&gt;and occasional hallucinated actions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem was not intelligence.&lt;/p&gt;

&lt;p&gt;The problem was cognitive overload inside the orchestration layer.&lt;/p&gt;

&lt;p&gt;Over time, one pattern became very clear:&lt;/p&gt;

&lt;p&gt;Agents perform significantly better when their responsibility boundaries are smaller.&lt;/p&gt;

&lt;p&gt;In our experience, once an agent moves beyond roughly 4–5 actively usable tools, reliability starts dropping rapidly. Similar enterprise orchestration patterns are now recommending smaller, specialized agents instead of monolithic “super agents.”&lt;/p&gt;

&lt;p&gt;That observation changed how we started designing AI systems.&lt;/p&gt;

&lt;p&gt;Instead of building one massive “do everything” agent, we moved toward specialized agents with tightly scoped responsibilities.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;A support agent handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;customer lookup,&lt;/li&gt;
&lt;li&gt;ticket creation,&lt;/li&gt;
&lt;li&gt;escalation routing,&lt;/li&gt;
&lt;li&gt;knowledge retrieval.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A commerce agent handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;orders,&lt;/li&gt;
&lt;li&gt;refunds,&lt;/li&gt;
&lt;li&gt;discounts,&lt;/li&gt;
&lt;li&gt;payments.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An operations agent handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;shipping,&lt;/li&gt;
&lt;li&gt;inventory,&lt;/li&gt;
&lt;li&gt;reporting,&lt;/li&gt;
&lt;li&gt;exports.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This immediately improved:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tool accuracy,&lt;/li&gt;
&lt;li&gt;execution consistency,&lt;/li&gt;
&lt;li&gt;observability,&lt;/li&gt;
&lt;li&gt;debugging,&lt;/li&gt;
&lt;li&gt;latency,&lt;/li&gt;
&lt;li&gt;and operational trust.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But another important learning came later.&lt;/p&gt;

&lt;p&gt;Even after distributing tools properly, systems still degraded when too many agents were active simultaneously.&lt;/p&gt;

&lt;p&gt;This is something many teams underestimate.&lt;/p&gt;

&lt;p&gt;As the number of agents increases, coordination overhead also increases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;more inter-agent communication,&lt;/li&gt;
&lt;li&gt;more memory synchronization,&lt;/li&gt;
&lt;li&gt;more orchestration reasoning,&lt;/li&gt;
&lt;li&gt;more retries,&lt;/li&gt;
&lt;li&gt;more conflict resolution,&lt;/li&gt;
&lt;li&gt;and more state tracking.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At lower scale, this is manageable.&lt;/p&gt;

&lt;p&gt;At enterprise scale, it becomes a serious engineering challenge.&lt;/p&gt;

&lt;p&gt;We observed cases where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;agents started waiting on each other,&lt;/li&gt;
&lt;li&gt;orchestration layers became bottlenecks,&lt;/li&gt;
&lt;li&gt;duplicate reasoning increased token burn,&lt;/li&gt;
&lt;li&gt;cascading retries created operational instability,&lt;/li&gt;
&lt;li&gt;and observability became extremely difficult.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multi-agent systems introduce their own scaling complexity around coordination, governance, and orchestration overhead. Most production-grade architecture guidance today recommends keeping orchestration layers as simple as possible.&lt;/p&gt;

&lt;p&gt;Over time, we established a few practical thumb rules internally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Some Practical Thumb Rules We Follow Now
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Keep Tool Count Small Per Agent
&lt;/h4&gt;

&lt;p&gt;Our practical guideline today is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3–5 tools → ideal&lt;/li&gt;
&lt;li&gt;6–8 tools → manageable with careful prompting&lt;/li&gt;
&lt;li&gt;10+ tools → requires routing/filtering layers&lt;/li&gt;
&lt;li&gt;15+ tools → usually an architectural warning sign&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The issue is not model capability.&lt;/p&gt;

&lt;p&gt;It is decision dilution.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Every Agent Must Have One Clear Business Responsibility
&lt;/h4&gt;

&lt;p&gt;We avoid mixing domains.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;payments + support,&lt;/li&gt;
&lt;li&gt;analytics + execution,&lt;/li&gt;
&lt;li&gt;reporting + approvals,&lt;/li&gt;
&lt;li&gt;inventory + customer engagement.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The narrower the responsibility boundary, the more predictable the behavior.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Start With the Lowest Complexity Possible
&lt;/h4&gt;

&lt;p&gt;One important learning from enterprise orchestration patterns is this:&lt;/p&gt;

&lt;p&gt;Do not introduce multi-agent architecture unless the workflow genuinely requires it.&lt;/p&gt;

&lt;p&gt;Sometimes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a prompt is enough,&lt;/li&gt;
&lt;li&gt;sometimes a single agent is enough,&lt;/li&gt;
&lt;li&gt;sometimes workflows are better handled through deterministic orchestration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not every problem needs “AI teamwork.”&lt;/p&gt;

&lt;h4&gt;
  
  
  4. Avoid Excessive Agent-to-Agent Conversations
&lt;/h4&gt;

&lt;p&gt;Agent collaboration sounds powerful in demos.&lt;/p&gt;

&lt;p&gt;But in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;every interaction increases latency,&lt;/li&gt;
&lt;li&gt;every message consumes tokens,&lt;/li&gt;
&lt;li&gt;every dependency creates failure paths.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We now aggressively reduce unnecessary conversations between agents.&lt;/p&gt;

&lt;h4&gt;
  
  
  5. Retrieval Before Reasoning
&lt;/h4&gt;

&lt;p&gt;Instead of exposing all tools to all agents, we first narrow candidates through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;semantic routing,&lt;/li&gt;
&lt;li&gt;metadata filtering,&lt;/li&gt;
&lt;li&gt;RAG-based retrieval,&lt;/li&gt;
&lt;li&gt;workflow classification.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This significantly improves tool selection accuracy and reduces reasoning load.&lt;/p&gt;

&lt;h4&gt;
  
  
  6. Observability Is Mandatory
&lt;/h4&gt;

&lt;p&gt;Once systems become multi-agent, debugging becomes one of the hardest engineering problems.&lt;/p&gt;

&lt;p&gt;We now treat the following as first-class requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;distributed tracing,&lt;/li&gt;
&lt;li&gt;token tracking,&lt;/li&gt;
&lt;li&gt;step-level logging,&lt;/li&gt;
&lt;li&gt;execution replay,&lt;/li&gt;
&lt;li&gt;agent health monitoring,&lt;/li&gt;
&lt;li&gt;retry visibility,&lt;/li&gt;
&lt;li&gt;and orchestration graphs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without observability, production support becomes nearly impossible.&lt;/p&gt;

&lt;h4&gt;
  
  
  7. Human Escalation Is Still Critical
&lt;/h4&gt;

&lt;p&gt;One thing we intentionally avoid is trying to automate every decision.&lt;/p&gt;

&lt;p&gt;We now introduce human checkpoints for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;financial operations,&lt;/li&gt;
&lt;li&gt;policy-sensitive actions,&lt;/li&gt;
&lt;li&gt;low-confidence reasoning,&lt;/li&gt;
&lt;li&gt;and customer-impacting workflows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Autonomy without governance becomes operational risk.&lt;/p&gt;

&lt;p&gt;What I increasingly believe is that the future of enterprise AI is not one giant super-agent.&lt;/p&gt;

&lt;p&gt;It is orchestrated systems of smaller specialized agents collaborating through routing, delegation, memory sharing, and controlled execution.&lt;/p&gt;

&lt;p&gt;The real engineering challenge is no longer:&lt;br&gt;
“How many tools can an agent use?”&lt;/p&gt;

&lt;p&gt;The better question is:&lt;br&gt;
“How effectively can we reduce the decision burden for each agent while keeping orchestration manageable?”&lt;/p&gt;

&lt;p&gt;That has become one of the most important scaling lessons for us while building production-grade agentic AI systems.&lt;/p&gt;

&lt;h1&gt;
  
  
  How We Are Thinking About This in Cloud Architecture
&lt;/h1&gt;

&lt;p&gt;One important realization for us was that multi-agent systems should not be treated as a single application deployment.&lt;/p&gt;

&lt;p&gt;They should be treated as distributed cloud-native systems.&lt;/p&gt;

&lt;p&gt;That changes the architecture significantly.&lt;/p&gt;

&lt;p&gt;Today, the architecture pattern we increasingly follow looks something like this:&lt;/p&gt;

&lt;h2&gt;
  
  
  Specialized Agents as Independent Services
&lt;/h2&gt;

&lt;p&gt;Each agent runs independently with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;isolated APIs,&lt;/li&gt;
&lt;li&gt;dedicated scaling,&lt;/li&gt;
&lt;li&gt;separate observability,&lt;/li&gt;
&lt;li&gt;isolated memory/context,&lt;/li&gt;
&lt;li&gt;and domain-level permissions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This reduces blast radius and improves operational governance.&lt;/p&gt;

&lt;p&gt;In AWS, this naturally aligns very well with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda,&lt;/li&gt;
&lt;li&gt;ECS/EKS,&lt;/li&gt;
&lt;li&gt;event-driven services,&lt;/li&gt;
&lt;li&gt;queues,&lt;/li&gt;
&lt;li&gt;Bedrock,&lt;/li&gt;
&lt;li&gt;and serverless orchestration patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I personally liked while evaluating newer AWS patterns is how Amazon Bedrock AgentCore is trying to standardize several production concerns around agents. Instead of teams writing custom orchestration glue repeatedly, AgentCore is introducing managed capabilities around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;runtime isolation,&lt;/li&gt;
&lt;li&gt;observability,&lt;/li&gt;
&lt;li&gt;memory,&lt;/li&gt;
&lt;li&gt;identity,&lt;/li&gt;
&lt;li&gt;tool gateways,&lt;/li&gt;
&lt;li&gt;and orchestration patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One thing I strongly relate to from practical experience is this:&lt;/p&gt;

&lt;p&gt;Building the reasoning layer is usually not the hardest part anymore.&lt;/p&gt;

&lt;p&gt;The harder part is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;orchestration,&lt;/li&gt;
&lt;li&gt;debugging,&lt;/li&gt;
&lt;li&gt;tracing,&lt;/li&gt;
&lt;li&gt;retries,&lt;/li&gt;
&lt;li&gt;governance,&lt;/li&gt;
&lt;li&gt;and operational scalability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is where systems usually become unstable at scale.&lt;/p&gt;

&lt;p&gt;AWS AgentCore Observability is also moving in an interesting direction by treating agent execution visibility as a first-class production capability with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;execution tracing,&lt;/li&gt;
&lt;li&gt;token monitoring,&lt;/li&gt;
&lt;li&gt;latency tracking,&lt;/li&gt;
&lt;li&gt;tool usage visibility,&lt;/li&gt;
&lt;li&gt;and CloudWatch integration. ()&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you have multiple agents collaborating dynamically, you need visibility into:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why a tool was selected,&lt;/li&gt;
&lt;li&gt;which agent delegated the task,&lt;/li&gt;
&lt;li&gt;what context was shared,&lt;/li&gt;
&lt;li&gt;where retries happened,&lt;/li&gt;
&lt;li&gt;and why execution paths changed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without this, production debugging becomes extremely difficult.&lt;/p&gt;

&lt;p&gt;Another pattern we increasingly prefer is asynchronous orchestration.&lt;/p&gt;

&lt;p&gt;Instead of tightly coupling agents synchronously, we now lean more toward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;queues,&lt;/li&gt;
&lt;li&gt;events,&lt;/li&gt;
&lt;li&gt;workflow engines,&lt;/li&gt;
&lt;li&gt;and loosely coupled communication.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It improves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;resilience,&lt;/li&gt;
&lt;li&gt;scalability,&lt;/li&gt;
&lt;li&gt;retry handling,&lt;/li&gt;
&lt;li&gt;and fault isolation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most importantly, it prevents one overloaded agent from slowing down the entire system.&lt;/p&gt;

&lt;p&gt;What I increasingly believe is that the future of enterprise AI is not one giant super-agent.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>aws</category>
      <category>agentcore</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Technical debt handling</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 11 May 2026 13:15:12 +0000</pubDate>
      <link>https://dev.to/amitkayal/technical-debt-handling-38on</link>
      <guid>https://dev.to/amitkayal/technical-debt-handling-38on</guid>
      <description>&lt;p&gt;Over the years, my opinion on technical debt has changed a lot. Earlier, I used to think technical debt meant bad engineering decisions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Now I think differently&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In product companies, especially fast-moving SaaS and AI products, some level of technical debt is unavoidable. If teams try to make everything perfect from day one, they usually move too slowly.&lt;br&gt;
The real problem is not technical debt.&lt;br&gt;
The real problem is when nobody knows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why the shortcut was taken&lt;/li&gt;
&lt;li&gt;how long it can survive&lt;/li&gt;
&lt;li&gt;what impact it will create later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Personally, I look at technical debt in 3 broad categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strategic debt : Shortcuts taken consciously to move faster, validate ideas, or release quickly.&lt;/li&gt;
&lt;li&gt;Operational debt: Things that slowly start hurting deployments, production stability, debugging, support effort, and developer productivity.&lt;/li&gt;
&lt;li&gt;Architectural debt: This is the one that becomes dangerous over time. Scaling becomes harder, integrations become messy, releases become slower, and every new feature starts feeling more expensive to build.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I feel AI products make this even more complicated. In normal SaaS systems, debt usually impacts engineering speed. But in AI systems, technical debt can directly affect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;response quality&lt;/li&gt;
&lt;li&gt;hallucination handling&lt;/li&gt;
&lt;li&gt;latency&lt;/li&gt;
&lt;li&gt;observability&lt;/li&gt;
&lt;li&gt;model cost&lt;/li&gt;
&lt;li&gt;evaluation consistency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And because AI systems are probabilistic, debugging becomes much harder compared to traditional software.&lt;/p&gt;

&lt;p&gt;I’ve also seen SaaS platforms suffer heavily from invisible debt because of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multi-tenant complexity&lt;/li&gt;
&lt;li&gt;customer-specific customizations&lt;/li&gt;
&lt;li&gt;integrations&lt;/li&gt;
&lt;li&gt;deployment dependencies&lt;/li&gt;
&lt;li&gt;security and compliance requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One weak architectural decision early on can create pain for years.&lt;/p&gt;

&lt;p&gt;That’s why I personally prefer making technical debt visible and measurable instead of treating it as a future problem.&lt;/p&gt;

&lt;p&gt;Some of the signals I usually watch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deployment friction&lt;/li&gt;
&lt;li&gt;rollback frequency&lt;/li&gt;
&lt;li&gt;incident trends&lt;/li&gt;
&lt;li&gt;onboarding difficulty for new engineers&lt;/li&gt;
&lt;li&gt;release confidence&lt;/li&gt;
&lt;li&gt;overall engineering velocity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One pattern I’ve noticed repeatedly:&lt;br&gt;
When team size keeps increasing but delivery speed keeps dropping, technical debt is already affecting the organization.&lt;/p&gt;

</description>
      <category>design</category>
      <category>ai</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Learnings while working with long-running AI agents</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 11 May 2026 13:12:53 +0000</pubDate>
      <link>https://dev.to/amitkayal/learnings-while-working-with-long-running-ai-agents-pi9</link>
      <guid>https://dev.to/amitkayal/learnings-while-working-with-long-running-ai-agents-pi9</guid>
      <description>&lt;p&gt;One of my biggest learnings while working with long-running AI agents is that logging and progress reporting are not optional features when the agent is tightly coupled with a UI — they are part of the product experience itself.&lt;/p&gt;

&lt;p&gt;Initially, I used to think of logging mainly from a debugging or engineering perspective. But with agentic systems, especially long-running workflows involving multiple tools, reasoning steps, APIs, retries, or multi-agent coordination, I realized users experience “silence” very differently than traditional applications.&lt;br&gt;
When an agent takes 30 seconds, 2 minutes, or longer without visible progress, users immediately start questioning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the system stuck?&lt;/li&gt;
&lt;li&gt;Did my request fail?&lt;/li&gt;
&lt;li&gt;Is it doing the wrong thing?&lt;/li&gt;
&lt;li&gt;Should I refresh or retry?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That uncertainty destroys trust very quickly.&lt;br&gt;
I learned that users do not just want the final answer — they want confidence that the system is actively working toward the answer. Progress visibility creates psychological assurance. Even simple updates like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Analyzing uploaded documents…”&lt;/li&gt;
&lt;li&gt;“Fetching data from CRM…”&lt;/li&gt;
&lt;li&gt;“Generating recommendations…”&lt;/li&gt;
&lt;li&gt;“Validating final response…”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;dramatically improve user confidence and patience.&lt;br&gt;
Another major realization was that long-running agents are fundamentally non-deterministic systems. Unlike traditional APIs, agents can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;take different execution paths,&lt;/li&gt;
&lt;li&gt;loop through reasoning,&lt;/li&gt;
&lt;li&gt;invoke tools dynamically,&lt;/li&gt;
&lt;li&gt;retry failed steps,&lt;/li&gt;
&lt;li&gt;or spend time resolving ambiguity.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without structured logging and traceability, debugging becomes extremely difficult because the same input may not always produce the same internal execution path. Modern AI observability emphasize tracing tool calls, reasoning paths, latency, token usage, and execution flow because agent behavior is inherently complex and probabilistic. &lt;/p&gt;

&lt;p&gt;I also learned that progress reporting is not only for users — it becomes equally important for engineering and operational visibility. Once agents move into production, observability helps teams identify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;where workflows slow down,&lt;/li&gt;
&lt;li&gt;which tool calls fail,&lt;/li&gt;
&lt;li&gt;why latency spikes happen,&lt;/li&gt;
&lt;li&gt;and where hallucinations or execution deviations originate. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One practical lesson I learned is that UI-integrated agents should expose execution state intentionally, not dump raw logs. There is a difference between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;engineering telemetry,&lt;/li&gt;
&lt;li&gt;operational traces,&lt;/li&gt;
&lt;li&gt;and user-friendly progress communication.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Users need understandable milestones, while engineers need deep execution traces.&lt;br&gt;
Another important learning was around perceived performance. In many cases, improving progress visibility improved user satisfaction more than reducing actual latency. A 90-second process with clear step-by-step reporting often feels faster and more reliable than a silent 40-second execution.&lt;/p&gt;

&lt;p&gt;Today, I strongly believe that for long-running AI agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;logging is part of reliability,&lt;/li&gt;
&lt;li&gt;progress reporting is part of UX,&lt;/li&gt;
&lt;li&gt;and observability is part of trust.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>genai</category>
      <category>agents</category>
      <category>aws</category>
    </item>
    <item>
      <title>Building a Hybrid AWS Microservices Platform with API Gateway, Lambda, ECS, and Load Balancers</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 20 Apr 2026 18:41:39 +0000</pubDate>
      <link>https://dev.to/amitkayal/building-a-hybrid-aws-microservices-platform-with-api-gateway-lambda-ecs-and-load-balancers-mnn</link>
      <guid>https://dev.to/amitkayal/building-a-hybrid-aws-microservices-platform-with-api-gateway-lambda-ecs-and-load-balancers-mnn</guid>
      <description>&lt;h1&gt;
  
  
  Building a Hybrid AWS Microservices Platform with API Gateway, Lambda, ECS, and Load Balancers
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;When teams start splitting a large backend into smaller services, the first infrastructure question is usually not "How do we build a microservice?" but "How do we expose many different services safely, consistently, and without creating a networking mess?"&lt;/p&gt;

&lt;p&gt;Our architecture provides a practical answer to that problem using a hybrid AWS design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API Gateway as the front door&lt;/li&gt;
&lt;li&gt;Lambda for lightweight serverless capabilities and supporting workflows&lt;/li&gt;
&lt;li&gt;ECS Fargate for containerized business services&lt;/li&gt;
&lt;li&gt;Internal load balancers for private service routing&lt;/li&gt;
&lt;li&gt;Terraform for repeatable, staged infrastructure delivery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important architectural idea is separation of concerns. Public access, authentication, routing, container execution, and service discovery are all handled by different layers. That keeps the platform easier to scale and much easier to evolve as the number of services grows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Pattern
&lt;/h2&gt;

&lt;p&gt;At a high level, the platform follows this flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A client sends an HTTPS request to API Gateway.&lt;/li&gt;
&lt;li&gt;API Gateway applies request-level controls such as API key enforcement, CORS behavior, and route matching.&lt;/li&gt;
&lt;li&gt;The request is sent either to a Lambda-backed endpoint or to a private containerized service.&lt;/li&gt;
&lt;li&gt;For ECS services, traffic goes through a VPC Link into internal load balancing.&lt;/li&gt;
&lt;li&gt;The load balancer forwards the request to the correct ECS service based on path rules.&lt;/li&gt;
&lt;li&gt;ECS Fargate runs one or more healthy tasks for that service and returns the response.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This gives a single API surface to consumers while allowing the backend implementation to vary by use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Combine Lambda and ECS?
&lt;/h2&gt;

&lt;p&gt;A platform like this benefits from using both compute models rather than forcing every workload into one.&lt;/p&gt;

&lt;p&gt;Lambda is a strong fit for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lightweight request handlers&lt;/li&gt;
&lt;li&gt;event-driven tasks&lt;/li&gt;
&lt;li&gt;simple orchestration&lt;/li&gt;
&lt;li&gt;platform support functions&lt;/li&gt;
&lt;li&gt;endpoints that do not need a full container lifecycle&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ECS Fargate is a better fit for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;long-lived HTTP microservices&lt;/li&gt;
&lt;li&gt;containerized frameworks and dependencies&lt;/li&gt;
&lt;li&gt;services that need more predictable runtime behavior&lt;/li&gt;
&lt;li&gt;APIs that benefit from load balancing, health checks, and horizontal scaling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In our architecture, the design supports both. Some APIs are routed to Lambda-based services, while others are routed to ECS services defined through service configuration. That hybrid model is useful in real organizations because all services do not have the same runtime needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Three-Stage Infrastructure Model
&lt;/h2&gt;

&lt;p&gt;One of the strongest ideas in our architecture is the staged Terraform layout. Instead of deploying everything together, the infrastructure is split into three layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 1: Networking
&lt;/h3&gt;

&lt;p&gt;The first stage establishes the network foundation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VPC selection or creation&lt;/li&gt;
&lt;li&gt;public and private subnet discovery or provisioning&lt;/li&gt;
&lt;li&gt;internal Network Load Balancer&lt;/li&gt;
&lt;li&gt;internal Application Load Balancer&lt;/li&gt;
&lt;li&gt;VPC Link for API Gateway&lt;/li&gt;
&lt;li&gt;ECS task security group&lt;/li&gt;
&lt;li&gt;ALB log storage and network observability components&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This stage is intentionally infrastructure-only. No application services are deployed here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 2: Compute
&lt;/h3&gt;

&lt;p&gt;The second stage provisions the actual execution environment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ECS cluster on Fargate&lt;/li&gt;
&lt;li&gt;ECR repositories for service images&lt;/li&gt;
&lt;li&gt;target groups per service&lt;/li&gt;
&lt;li&gt;ALB listener and listener rules&lt;/li&gt;
&lt;li&gt;ECS service definitions&lt;/li&gt;
&lt;li&gt;CloudWatch log groups&lt;/li&gt;
&lt;li&gt;Lambda functions used by the platform&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This stage consumes outputs from the networking stage so the compute layer never hardcodes network assumptions in its own design.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 3: API Gateways
&lt;/h3&gt;

&lt;p&gt;The third stage exposes services through API Gateway:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a public API for internet-facing consumption&lt;/li&gt;
&lt;li&gt;a private API for VPC-only access&lt;/li&gt;
&lt;li&gt;route creation from service metadata&lt;/li&gt;
&lt;li&gt;VPC Link integrations for containerized services&lt;/li&gt;
&lt;li&gt;Lambda proxy integrations for Lambda-backed services&lt;/li&gt;
&lt;li&gt;API keys, usage plans, and stage configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This split is operationally important. Teams can change routing without rebuilding networking, and they can add services without redesigning the entire platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Request Path for ECS Services
&lt;/h2&gt;

&lt;p&gt;For containerized microservices, the implementation follows a private ingress model.&lt;/p&gt;

&lt;p&gt;The path is:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Client -&amp;gt; API Gateway -&amp;gt; VPC Link -&amp;gt; internal NLB -&amp;gt; internal ALB -&amp;gt; ECS service -&amp;gt; ECS task&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That may look like one hop too many at first, but each layer has a purpose.&lt;/p&gt;

&lt;h3&gt;
  
  
  API Gateway
&lt;/h3&gt;

&lt;p&gt;API Gateway is the public control plane. It handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TLS termination at the edge&lt;/li&gt;
&lt;li&gt;route exposure&lt;/li&gt;
&lt;li&gt;API key enforcement&lt;/li&gt;
&lt;li&gt;request and header mapping&lt;/li&gt;
&lt;li&gt;CORS handling&lt;/li&gt;
&lt;li&gt;stage-based deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It gives consumers a stable API contract while keeping the backend private.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why a VPC Link Is Used
&lt;/h3&gt;

&lt;p&gt;ECS services are not exposed directly to the internet. Instead, API Gateway connects privately into the VPC using a VPC Link. That allows the public API layer to reach internal services without making the services themselves public.&lt;/p&gt;

&lt;p&gt;This is a strong security pattern because the application runtime stays inside the VPC, but consumers still get a clean managed API endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why the Repository Uses Both NLB and ALB
&lt;/h3&gt;

&lt;p&gt;A useful implementation detail in our architecture is that the VPC Link targets an internal Network Load Balancer, and that NLB forwards to an internal Application Load Balancer.&lt;/p&gt;

&lt;p&gt;This arrangement provides two separate benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The NLB is used as the stable target for the API Gateway VPC Link.&lt;/li&gt;
&lt;li&gt;The ALB performs path-based routing to the actual microservices.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ALB is what makes many ECS services practical behind one internal entry point. Each service gets its own listener rule and target group, so the platform can route based on URL path rather than provisioning a separate load balancer per service.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Load Balancing Works
&lt;/h2&gt;

&lt;p&gt;The load-balancing model is service-oriented.&lt;/p&gt;

&lt;p&gt;Each ECS microservice contributes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a base API path&lt;/li&gt;
&lt;li&gt;an ALB path pattern&lt;/li&gt;
&lt;li&gt;a listener rule priority&lt;/li&gt;
&lt;li&gt;a container port&lt;/li&gt;
&lt;li&gt;a health check definition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From that metadata, Terraform creates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one target group per service&lt;/li&gt;
&lt;li&gt;one listener rule per service&lt;/li&gt;
&lt;li&gt;one ECS service per service&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means the routing layer is not manually duplicated for every new microservice. The service declares its path and runtime settings, and the platform generates the infrastructure around it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Target Groups
&lt;/h3&gt;

&lt;p&gt;Each target group points to ECS tasks using IP targets. That is the correct choice for Fargate because tasks run with their own elastic networking interfaces rather than on shared EC2 hosts.&lt;/p&gt;

&lt;p&gt;The target groups in this repository also use application-level health checks. A task is considered healthy only when its service endpoint responds successfully on the configured health path.&lt;/p&gt;

&lt;p&gt;That matters because container startup is not the same as application readiness. A service may be running from ECS's perspective but still not ready to receive traffic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Listener Rules
&lt;/h3&gt;

&lt;p&gt;The ALB listener is configured once, and each service gets a path-based rule. For example, a service under a quoting path can be matched independently from a service under a product-pricing path.&lt;/p&gt;

&lt;p&gt;This keeps the routing layer centralized and avoids deploying a dedicated ALB per service, which would become expensive and operationally noisy as the platform grows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Health Checks and Traffic Protection
&lt;/h3&gt;

&lt;p&gt;The repository uses health checks in multiple places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API health endpoints at the application level&lt;/li&gt;
&lt;li&gt;ALB target group health checks&lt;/li&gt;
&lt;li&gt;ECS service health grace periods&lt;/li&gt;
&lt;li&gt;container health checks inside the task definition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That layered approach improves resilience:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unhealthy tasks are removed from target groups&lt;/li&gt;
&lt;li&gt;ECS replaces failed tasks&lt;/li&gt;
&lt;li&gt;API Gateway continues to route through the same private entry point&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a platform that can recover from instance-level failures without changing the public API contract.&lt;/p&gt;

&lt;h2&gt;
  
  
  How ECS Is Structured
&lt;/h2&gt;

&lt;p&gt;The ECS side of the platform is built for repeatability rather than one-off service definitions.&lt;/p&gt;

&lt;h3&gt;
  
  
  ECS Cluster
&lt;/h3&gt;

&lt;p&gt;The platform provisions a shared ECS cluster per environment. That allows multiple microservices to run within the same operational boundary while still being isolated at the task and service level.&lt;/p&gt;

&lt;p&gt;The cluster uses Fargate, which removes the need to manage EC2 worker nodes. This simplifies operations significantly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no patching of container hosts&lt;/li&gt;
&lt;li&gt;no cluster capacity management at the instance level&lt;/li&gt;
&lt;li&gt;easier scaling by task count&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Reusable ECS Service Module
&lt;/h3&gt;

&lt;p&gt;Instead of defining each ECS service from scratch, the repository uses a reusable Terraform module for service deployment.&lt;/p&gt;

&lt;p&gt;That module is responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;task definition creation&lt;/li&gt;
&lt;li&gt;container logging configuration&lt;/li&gt;
&lt;li&gt;IAM role wiring&lt;/li&gt;
&lt;li&gt;ECS service creation&lt;/li&gt;
&lt;li&gt;target group attachment&lt;/li&gt;
&lt;li&gt;subnet and security group placement&lt;/li&gt;
&lt;li&gt;optional capacity provider strategy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a strong platform choice. It makes service onboarding consistent and reduces drift between services.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task Definitions
&lt;/h3&gt;

&lt;p&gt;Each service runs as a Fargate task with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a named container image from ECR&lt;/li&gt;
&lt;li&gt;CPU and memory settings&lt;/li&gt;
&lt;li&gt;environment variables&lt;/li&gt;
&lt;li&gt;a health check command&lt;/li&gt;
&lt;li&gt;CloudWatch logging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repository also includes support for an additional X-Ray sidecar container in the task definition pattern, which is useful for distributed tracing in a microservice environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Network Mode
&lt;/h3&gt;

&lt;p&gt;Tasks run with &lt;code&gt;awsvpc&lt;/code&gt; networking, which gives each task its own network interface and private IP. This is the standard model for ECS on Fargate and is what allows ALB target groups to use IP mode cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Subnet and Security Group Design
&lt;/h2&gt;

&lt;p&gt;This repository supports both existing/default VPC usage and a more segmented custom VPC model.&lt;/p&gt;

&lt;p&gt;That flexibility matters because many teams start in a default-VPC or dev-friendly setup and later move to stricter network isolation for staging and production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subnet Placement
&lt;/h3&gt;

&lt;p&gt;The network layer discovers public and private subnets where available. In a custom VPC, the design supports proper private subnet deployment. In a simpler default VPC setup, the platform can fall back to available public subnets when private ones are not present.&lt;/p&gt;

&lt;p&gt;This is an important operational nuance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;development environments often optimize for simplicity&lt;/li&gt;
&lt;li&gt;higher environments usually optimize for stricter isolation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repository is built to handle both.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Groups
&lt;/h3&gt;

&lt;p&gt;The security model follows least-privilege intent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ECS tasks accept application traffic from the internal load-balancing layer&lt;/li&gt;
&lt;li&gt;services are not directly internet-facing&lt;/li&gt;
&lt;li&gt;API Gateway reaches backend services through private network integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This keeps the application tier out of direct public exposure while still allowing a public API facade.&lt;/p&gt;

&lt;h2&gt;
  
  
  Config-Driven Service Onboarding
&lt;/h2&gt;

&lt;p&gt;One of the most scalable ideas in our architecture is that services are registered through configuration rather than by handcrafting infrastructure every time.&lt;/p&gt;

&lt;p&gt;There is a master service registry that lists enabled services per environment, and each service provides its own deployment metadata, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;service identity&lt;/li&gt;
&lt;li&gt;container port&lt;/li&gt;
&lt;li&gt;desired task count&lt;/li&gt;
&lt;li&gt;CPU and memory&lt;/li&gt;
&lt;li&gt;API base path&lt;/li&gt;
&lt;li&gt;ALB path pattern&lt;/li&gt;
&lt;li&gt;listener priority&lt;/li&gt;
&lt;li&gt;health check behavior&lt;/li&gt;
&lt;li&gt;logging retention&lt;/li&gt;
&lt;li&gt;autoscaling preferences&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This creates a platform model rather than a collection of unrelated microservices.&lt;/p&gt;

&lt;p&gt;Adding a new service becomes a repeatable process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create the service.&lt;/li&gt;
&lt;li&gt;Define its configuration.&lt;/li&gt;
&lt;li&gt;Register it in the service catalog.&lt;/li&gt;
&lt;li&gt;Build and publish the image.&lt;/li&gt;
&lt;li&gt;Apply Terraform stages.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is much easier to maintain than cloning infrastructure blocks over and over.&lt;/p&gt;

&lt;h2&gt;
  
  
  Container Delivery with ECR
&lt;/h2&gt;

&lt;p&gt;For ECS workloads, the container supply chain is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build the service image.&lt;/li&gt;
&lt;li&gt;Push it to an ECR repository.&lt;/li&gt;
&lt;li&gt;Reference the tagged image in the ECS task definition.&lt;/li&gt;
&lt;li&gt;Update the ECS service to roll out the new task definition.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Our platform provisions one ECR repository per service, with image scanning enabled. That is a good baseline for a microservices platform because it keeps artifacts separated by service while still following a common naming convention.&lt;/p&gt;

&lt;p&gt;There is also an explicit deployment phase between infrastructure provisioning and API exposure where container images are built and pushed. That is a practical real-world step many diagrams omit, but it is essential because ECS cannot run a service until the image exists in the registry.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Lambda Fits into the Platform
&lt;/h2&gt;

&lt;p&gt;Lambda is used here as a first-class platform option, not as an afterthought.&lt;/p&gt;

&lt;p&gt;There are two useful Lambda patterns in our architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Lambda as an API Backend
&lt;/h3&gt;

&lt;p&gt;Some services can be exposed through API Gateway using Lambda proxy integration. This is ideal for capabilities that are naturally event-driven, lightweight, or operationally simpler as functions than as always-on containers.&lt;/p&gt;

&lt;p&gt;In this model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API Gateway owns the route&lt;/li&gt;
&lt;li&gt;Lambda executes the business logic&lt;/li&gt;
&lt;li&gt;API Gateway returns the Lambda response directly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This avoids unnecessary load-balancer and container overhead for smaller workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Lambda as a Platform Support Function
&lt;/h3&gt;

&lt;p&gt;Our architecture also provisions Lambda functions that support the overall platform, such as authentication-related or onboarding-related workflows.&lt;/p&gt;

&lt;p&gt;This is a smart use of Lambda in a hybrid platform because not every supporting concern needs to run inside ECS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Authentication and API Protection
&lt;/h2&gt;

&lt;p&gt;Our architecture clearly treats API protection as an API Gateway concern.&lt;/p&gt;

&lt;p&gt;The current public API implementation enforces API key usage through API Gateway methods, API keys, and usage plans. The codebase also provisions a supporting API key validation Lambda function and related permissions, which shows the platform is designed to accommodate Lambda-based validation flows where needed.&lt;/p&gt;

&lt;p&gt;From a blog perspective, the important architectural takeaway is this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keep authentication and traffic governance at the gateway layer&lt;/li&gt;
&lt;li&gt;keep service containers focused on business logic&lt;/li&gt;
&lt;li&gt;keep private workloads private&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That separation keeps the platform easier to secure and easier to reason about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Public and Private API Models
&lt;/h2&gt;

&lt;p&gt;Another strength of our architecture is that it supports both public and private APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Public API
&lt;/h3&gt;

&lt;p&gt;The public API is intended for internet-facing access. It handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;external client access&lt;/li&gt;
&lt;li&gt;API keys and usage plans&lt;/li&gt;
&lt;li&gt;CORS behavior&lt;/li&gt;
&lt;li&gt;Lambda and ECS route exposure&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Private API
&lt;/h3&gt;

&lt;p&gt;The private API is intended for internal or VPC-scoped access. It is useful when services should only be reachable from trusted network boundaries such as internal AWS workloads, integration environments, or enterprise connectivity paths.&lt;/p&gt;

&lt;p&gt;This split is helpful when some capabilities should be public and others should remain internal even though they share the same service platform underneath.&lt;/p&gt;

&lt;h2&gt;
  
  
  Observability and Operations
&lt;/h2&gt;

&lt;p&gt;A microservices platform is only as good as its operational visibility.&lt;/p&gt;

&lt;p&gt;Our architecture includes observability at several levels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CloudWatch log groups for ECS services&lt;/li&gt;
&lt;li&gt;CloudWatch logs for Lambda functions&lt;/li&gt;
&lt;li&gt;API Gateway stage logging&lt;/li&gt;
&lt;li&gt;ALB logging support&lt;/li&gt;
&lt;li&gt;VPC flow logging&lt;/li&gt;
&lt;li&gt;X-Ray-friendly task patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination helps answer the most common production questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did the request reach the gateway?&lt;/li&gt;
&lt;li&gt;Was it routed to the right backend?&lt;/li&gt;
&lt;li&gt;Was the target healthy?&lt;/li&gt;
&lt;li&gt;Did the service fail or time out?&lt;/li&gt;
&lt;li&gt;Was the problem in networking, routing, or application logic?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without that layered visibility, hybrid platforms become difficult to troubleshoot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling Characteristics
&lt;/h2&gt;

&lt;p&gt;This architecture scales well because each layer can evolve somewhat independently.&lt;/p&gt;

&lt;h3&gt;
  
  
  API Layer Scaling
&lt;/h3&gt;

&lt;p&gt;API Gateway absorbs public traffic without requiring the backend to manage edge-facing concerns directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  ECS Scaling
&lt;/h3&gt;

&lt;p&gt;ECS services scale by task count. Each service can define:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;desired count&lt;/li&gt;
&lt;li&gt;minimum and maximum capacity&lt;/li&gt;
&lt;li&gt;CPU and memory sizing&lt;/li&gt;
&lt;li&gt;autoscaling thresholds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means heavily used services can scale out without affecting lighter services.&lt;/p&gt;

&lt;h3&gt;
  
  
  Platform Growth
&lt;/h3&gt;

&lt;p&gt;As more services are added, the platform does not need a new ingress pattern each time. The same path-based routing model continues to work as long as route definitions and listener priorities stay clean.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alignment with AWS Well-Architected Best Practices
&lt;/h2&gt;

&lt;p&gt;This architecture also aligns well with AWS best-practice design principles, especially the AWS Well-Architected mindset.&lt;/p&gt;

&lt;h3&gt;
  
  
  Operational Excellence
&lt;/h3&gt;

&lt;p&gt;We have structured the platform so that it is operated as a system rather than as a collection of one-off deployments.&lt;/p&gt;

&lt;p&gt;This is reflected in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;staged Terraform deployments for clearer ownership and safer changes&lt;/li&gt;
&lt;li&gt;configuration-driven service onboarding&lt;/li&gt;
&lt;li&gt;consistent ECS service patterns through reusable modules&lt;/li&gt;
&lt;li&gt;standardized logging and deployment workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This reduces manual drift and makes operational changes more repeatable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security
&lt;/h3&gt;

&lt;p&gt;Security is addressed through layered controls rather than a single protection point.&lt;/p&gt;

&lt;p&gt;We have adhered to good AWS security practices by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;placing ECS services behind private networking rather than exposing them directly&lt;/li&gt;
&lt;li&gt;using API Gateway as the controlled ingress layer&lt;/li&gt;
&lt;li&gt;applying API-level protection at the gateway&lt;/li&gt;
&lt;li&gt;using security groups to limit east-west traffic&lt;/li&gt;
&lt;li&gt;supporting encrypted log and storage patterns&lt;/li&gt;
&lt;li&gt;separating public access from internal service routing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This follows the AWS principle of strong boundaries, least privilege, and defense in depth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reliability
&lt;/h3&gt;

&lt;p&gt;Reliability comes from designing for failure at the service and routing layers.&lt;/p&gt;

&lt;p&gt;We have incorporated that through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multi-AZ subnet placement&lt;/li&gt;
&lt;li&gt;load balancer health checks&lt;/li&gt;
&lt;li&gt;ECS task replacement behavior&lt;/li&gt;
&lt;li&gt;target group isolation per service&lt;/li&gt;
&lt;li&gt;decoupled gateway and backend layers&lt;/li&gt;
&lt;li&gt;staged infrastructure dependencies with clear outputs between layers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means a failing task or unhealthy target does not require the API surface itself to change.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Efficiency
&lt;/h3&gt;

&lt;p&gt;The architecture chooses the right compute model for the right workload.&lt;/p&gt;

&lt;p&gt;That is an AWS best practice because it avoids treating all traffic the same.&lt;/p&gt;

&lt;p&gt;Examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda for lighter, event-oriented, or supporting workflows&lt;/li&gt;
&lt;li&gt;ECS Fargate for containerized services that need steady HTTP handling&lt;/li&gt;
&lt;li&gt;ALB path-based routing for efficient multi-service consolidation&lt;/li&gt;
&lt;li&gt;service-specific CPU, memory, and scaling settings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This lets us tune services independently instead of overprovisioning everything at the platform level.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost Optimization
&lt;/h3&gt;

&lt;p&gt;Cost optimization is also visible in the design choices.&lt;/p&gt;

&lt;p&gt;We are not multiplying infrastructure unnecessarily. Instead, the architecture encourages shared but controlled platform components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one API layer for many services&lt;/li&gt;
&lt;li&gt;one internal routing layer for many ECS workloads&lt;/li&gt;
&lt;li&gt;shared ECS cluster patterns per environment&lt;/li&gt;
&lt;li&gt;service-level scaling instead of blanket scaling&lt;/li&gt;
&lt;li&gt;support for Fargate and optional capacity-provider strategies where appropriate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is much closer to AWS best practice than provisioning separate ingress and compute stacks for every small service.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sustainability and Maintainability
&lt;/h3&gt;

&lt;p&gt;Even when sustainability is not called out directly, maintainable designs usually consume fewer engineering and infrastructure resources over time.&lt;/p&gt;

&lt;p&gt;The architecture helps here by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reducing duplicated infrastructure definitions&lt;/li&gt;
&lt;li&gt;making service onboarding metadata-driven&lt;/li&gt;
&lt;li&gt;encouraging reuse of shared platform components&lt;/li&gt;
&lt;li&gt;keeping the public contract stable while backend services evolve&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That leads to lower long-term complexity, which is a practical form of architectural efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Pattern Works Well
&lt;/h2&gt;

&lt;p&gt;This AWS pattern is effective because it balances standardization with flexibility.&lt;/p&gt;

&lt;p&gt;It standardizes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deployment stages&lt;/li&gt;
&lt;li&gt;ingress architecture&lt;/li&gt;
&lt;li&gt;service registration&lt;/li&gt;
&lt;li&gt;load-balancer behavior&lt;/li&gt;
&lt;li&gt;logging and health checks&lt;/li&gt;
&lt;li&gt;ECS service creation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It stays flexible by allowing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda-backed endpoints&lt;/li&gt;
&lt;li&gt;ECS-backed endpoints&lt;/li&gt;
&lt;li&gt;public and private APIs&lt;/li&gt;
&lt;li&gt;different service-level scaling and runtime settings&lt;/li&gt;
&lt;li&gt;multiple environments with different networking strategies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly what a growing microservices platform needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Implementation Advice
&lt;/h2&gt;

&lt;p&gt;If you want to implement a similar architecture, a good sequence is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build the networking foundation first.&lt;/li&gt;
&lt;li&gt;Keep all service backends private.&lt;/li&gt;
&lt;li&gt;Put API Gateway in front of everything external.&lt;/li&gt;
&lt;li&gt;Use ECS Fargate for containerized APIs that benefit from long-lived service behavior.&lt;/li&gt;
&lt;li&gt;Use Lambda for support functions and lightweight endpoints.&lt;/li&gt;
&lt;li&gt;Register services through metadata, not repetitive infrastructure definitions.&lt;/li&gt;
&lt;li&gt;Use path-based ALB routing so many services can share one internal ingress layer.&lt;/li&gt;
&lt;li&gt;Add strong health checks and centralized logs before traffic grows.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key is not just choosing AWS services, but assigning each AWS service a clear responsibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Our architecture demonstrates a mature way to implement Lambda and ECS-based microservices through API Gateway without exposing backend services directly.&lt;/p&gt;

&lt;p&gt;The architecture uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;staged Terraform for separation of concerns&lt;/li&gt;
&lt;li&gt;API Gateway as the public and private API facade&lt;/li&gt;
&lt;li&gt;Lambda where serverless execution makes sense&lt;/li&gt;
&lt;li&gt;ECS Fargate for containerized microservices&lt;/li&gt;
&lt;li&gt;NLB and ALB together for private, path-aware routing&lt;/li&gt;
&lt;li&gt;config-driven onboarding for scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For teams building an enterprise microservices platform, this is a strong pattern because it supports security, operational clarity, and service growth without forcing every workload into the same runtime model.&lt;/p&gt;

&lt;p&gt;Most importantly, it turns infrastructure into a reusable platform. Once that platform is in place, adding the next service becomes much easier than adding the first one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lessons Learned
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Keeping API Gateway as the front door and backend services private makes the architecture easier to secure and easier to evolve.&lt;/li&gt;
&lt;li&gt;Using both Lambda and ECS is more practical than forcing every use case into a single compute model.&lt;/li&gt;
&lt;li&gt;Path-based routing through shared internal load balancing scales better than creating isolated ingress infrastructure for every service.&lt;/li&gt;
&lt;li&gt;Service onboarding becomes significantly easier when routing, health checks, scaling, and runtime settings are driven by configuration.&lt;/li&gt;
&lt;li&gt;Health checks, logging, and observability need to be designed from the beginning; adding them later is much harder in a distributed system.&lt;/li&gt;
&lt;li&gt;A staged infrastructure model reduces operational risk because networking, compute, and API exposure can be changed independently.&lt;/li&gt;
&lt;li&gt;Standardizing platform patterns early saves substantial effort as the number of microservices grows.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>architecture</category>
      <category>aws</category>
      <category>lambda</category>
      <category>apigateway</category>
    </item>
    <item>
      <title>Building a Practical Lambda Capacity Provider Platform: Lessons Learned from Warm Pools, Version Hygiene, and CI/CD Reality</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 20 Apr 2026 18:25:06 +0000</pubDate>
      <link>https://dev.to/amitkayal/building-a-practical-lambda-capacity-provider-platform-lessons-learned-from-warm-pools-version-1l7j</link>
      <guid>https://dev.to/amitkayal/building-a-practical-lambda-capacity-provider-platform-lessons-learned-from-warm-pools-version-1l7j</guid>
      <description>&lt;h1&gt;
  
  
  Building a Practical Lambda Capacity Provider Platform: Lessons Learned from Warm Pools, Version Hygiene, and CI/CD Reality
&lt;/h1&gt;

&lt;p&gt;There is a big difference between a slide-deck architecture and an operating system you can trust on a Monday morning.&lt;/p&gt;

&lt;p&gt;This implementation captures that difference well. On paper, the idea is simple: create a shared AWS Lambda Managed Instances capacity provider, run latency-sensitive workloads on ARM64, keep the pool warm with EventBridge, prune old Lambda versions before they become operational debt, and wrap the whole thing in a GitHub Actions plus CodeBuild delivery model. In practice, each of those choices changes how you think about performance, cost, blast radius, and developer discipline.&lt;/p&gt;

&lt;p&gt;What follows is not a generic cloud post. It is the kind of write-up you produce after actually building and living with the system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Problem We Were Solving
&lt;/h2&gt;

&lt;p&gt;Traditional Lambda is excellent when you want abstraction and convenience. It becomes less elegant when your workload is sensitive to startup time, carries heavier dependencies, or needs more predictable execution behavior under bursty load.&lt;/p&gt;

&lt;p&gt;That is where a Lambda capacity provider changes the discussion.&lt;/p&gt;

&lt;p&gt;In this implementation, the platform is built around a shared &lt;code&gt;aws_lambda_capacity_provider&lt;/code&gt; that uses ARM64 Graviton instances and auto scaling. The core idea is straightforward: instead of leaving execution placement entirely to the default Lambda fleet, we deliberately provide a managed compute pool that multiple functions can share. That gives us more control over cost-performance characteristics and lets us design around cold-start pain rather than merely complain about it.&lt;/p&gt;

&lt;p&gt;The choice is visible in the Terraform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The provider runs on &lt;code&gt;arm64&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Allowed instance types are constrained to &lt;code&gt;m6g.large&lt;/code&gt;, &lt;code&gt;m6g.xlarge&lt;/code&gt;, &lt;code&gt;m7g.large&lt;/code&gt;, and &lt;code&gt;m7g.xlarge&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Scaling is set to &lt;code&gt;Auto&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The maximum pool ceiling is set to &lt;code&gt;64&lt;/code&gt; vCPU&lt;/li&gt;
&lt;li&gt;The capacity provider is placed in the default VPC, with unsupported Availability Zones filtered out&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point matters more than it first appears. The code explicitly excludes unsupported AZs such as &lt;code&gt;us-east-1e&lt;/code&gt;, which is a good example of operational maturity: the happy path is not enough when the service itself has placement constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  How We Actually Created the Capacity Provider
&lt;/h2&gt;

&lt;p&gt;One thing I wanted this platform to avoid was "concept architecture" with no implementation backbone. So the capacity provider here is not described abstractly. It is provisioned directly in Terraform and wired into the Lambda lifecycle in a fairly intentional way.&lt;/p&gt;

&lt;p&gt;The build starts in &lt;code&gt;terraform_file/agent_core_sync_cp.tf&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;First, the capacity provider itself is created with &lt;code&gt;aws_lambda_capacity_provider&lt;/code&gt;. The naming pattern ties it to the service and environment, which is the right instinct for multi-environment operation. The provider is tagged as shared compute for agent workloads, which matters later for discoverability and platform governance.&lt;/p&gt;

&lt;p&gt;Second, the provider is placed inside the default VPC, but not blindly. In &lt;code&gt;terraform_file/data.tf&lt;/code&gt;, the code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;discovers the default VPC&lt;/li&gt;
&lt;li&gt;fetches the default subnets&lt;/li&gt;
&lt;li&gt;inspects subnet Availability Zones one by one&lt;/li&gt;
&lt;li&gt;excludes unsupported zones such as &lt;code&gt;us-east-1e&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;optionally caps how many subnets are used&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a subtle but important design choice. Lambda Managed Instances often create one placement footprint per subnet or AZ. If you do not control subnet spread, you can end up creating more infrastructure surface area than you intended.&lt;/p&gt;

&lt;p&gt;Third, the provider uses a dedicated security group rather than inheriting something vague and accidental. The current implementation keeps outbound traffic fully open and allows inbound HTTPS. That is permissive, but it is at least explicit and repeatable. Early-stage platforms benefit from that kind of clarity.&lt;/p&gt;

&lt;p&gt;Fourth, the capacity provider gets its own operator role through &lt;code&gt;AWSLambdaManagedEC2ResourceOperator&lt;/code&gt;. That is a critical detail. Capacity providers are not just Lambda resources; they need AWS to manage the EC2-backed execution infrastructure on your behalf. If you miss that role, the platform does not really exist no matter how nice your Terraform looks.&lt;/p&gt;

&lt;p&gt;Fifth, the instance requirements are opinionated. The code forces &lt;code&gt;arm64&lt;/code&gt; and narrows the fleet to supported Graviton M-family instance types. That is one of the better engineering decisions in this implementation because it converts an architectural preference into an enforceable runtime rule.&lt;/p&gt;

&lt;p&gt;Finally, the Lambda function is attached to the capacity provider in &lt;code&gt;terraform_file/lambda_clm_router_agent.tf&lt;/code&gt; through &lt;code&gt;capacity_provider_config&lt;/code&gt;. That is where the abstraction becomes real. We are not just provisioning a pool and hoping someone uses it later. We are explicitly binding a published Lambda to that pool and tuning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;memory GiB per vCPU&lt;/li&gt;
&lt;li&gt;max concurrency per execution environment&lt;/li&gt;
&lt;li&gt;ARM64 runtime alignment&lt;/li&gt;
&lt;li&gt;published versioning through Lambda aliases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the full loop: provision shared compute, constrain placement, grant AWS the operator role it needs, attach live functions to the pool, and then manage the resulting version sprawl with automation. That is what makes this feel like a platform artifact rather than a loose Terraform experiment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 1: A Capacity Provider Is Not a Tuning Knob. It Is an Operating Model.
&lt;/h2&gt;

&lt;p&gt;Teams often talk about capacity providers as if they are just a performance optimization. That framing is too shallow.&lt;/p&gt;

&lt;p&gt;The moment you move Lambda onto managed instances, you are no longer only buying faster startup. You are adopting a new operating model with very clear implications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You now care about instance family compatibility&lt;/li&gt;
&lt;li&gt;You need to think about subnet strategy and AZ support&lt;/li&gt;
&lt;li&gt;You have to reason about pool scaling ceilings, concurrency, and memory per vCPU&lt;/li&gt;
&lt;li&gt;You are effectively blending serverless ergonomics with infrastructure accountability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This implementation shows that transition clearly. The CLM router Lambda is not just declared with a runtime and handler. It is attached to the shared capacity provider and explicitly tuned with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;execution_environment_memory_gib_per_vcpu&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;per_execution_environment_max_concurrency&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;publish = true&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;architectures = ["arm64"]&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the tell. Once we start specifying how execution environments should behave, we are no longer simply "deploying a Lambda." We are shaping compute economics.&lt;/p&gt;

&lt;p&gt;The practical lesson here is simple: if you adopt Lambda Managed Instances, treat it like platform engineering, not like a runtime checkbox.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 2: ARM64 Delivers Real Value, but Only if You Respect Service Constraints
&lt;/h2&gt;

&lt;p&gt;One of the strongest decisions in this implementation is the bias toward Graviton. For Python-heavy agent workloads, ARM64 is usually the right default. The economics are better, and the performance-per-dollar story is often compelling.&lt;/p&gt;

&lt;p&gt;But there is an important nuance that the Terraform comments correctly capture: not every EC2 family you might expect is supported in the way you assume. This implementation explicitly avoids unsupported combinations and narrows the fleet to supported M-family Graviton instances.&lt;/p&gt;

&lt;p&gt;That is a good lesson in cloud architecture generally: cloud products market flexibility, but production systems survive on constraint management.&lt;/p&gt;

&lt;p&gt;The teams that do well with modern AWS services are not the ones that assume every SKU works. They are the ones that encode the service's real boundaries in Terraform so no one has to rediscover them during an incident window.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 3: Warmup Is Not a Hack. It Is a Deliberate Control Loop.
&lt;/h2&gt;

&lt;p&gt;There is a tendency in engineering circles to treat "warming" as a slightly embarrassing workaround. I think that is the wrong mindset.&lt;/p&gt;

&lt;p&gt;This implementation schedules the CLM router Lambda every five minutes through EventBridge. The handler itself is intentionally lightweight and effectively acts as a keep-alive mechanism. That is not laziness. It is an explicit decision to keep the shared pool alive for latency-sensitive traffic.&lt;/p&gt;

&lt;p&gt;More specifically, the warmer exists to reduce the probability that the capacity provider has to spin up fresh managed instance capacity for a new invocation path after a quiet period. That is the practical point of the EventBridge rule in &lt;code&gt;terraform_file/eventbridge_cp_arm.tf&lt;/code&gt;. By invoking the Lambda on a steady &lt;code&gt;rate(5 minutes)&lt;/code&gt; schedule, the platform keeps the execution path warm enough that the shared capacity provider is less likely to fall all the way back to a cold, scale-from-zero posture right before a real request arrives.&lt;/p&gt;

&lt;p&gt;The important insight is this: once you care about cold-start predictability, you need a control loop.&lt;/p&gt;

&lt;p&gt;That control loop can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provisioned concurrency&lt;/li&gt;
&lt;li&gt;Scheduled warmers&lt;/li&gt;
&lt;li&gt;Request shaping&lt;/li&gt;
&lt;li&gt;A shared managed instance pool&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this design, the team chose scheduled warm invocation plus a shared capacity provider. That is a sensible middle ground. It is cheaper and simpler than overcommitting always-on infrastructure, while still materially reducing the first-hit penalty.&lt;/p&gt;

&lt;p&gt;In plain English: the EventBridge warmer is being used here so the capacity provider does not need to spin up a brand-new server footprint every time traffic reappears after idle time. For interactive or latency-sensitive agent workloads, that is a very practical optimization.&lt;/p&gt;

&lt;p&gt;The strategic lesson is that warmup should be measured against business latency, not ideological purity. If a five-minute EventBridge schedule protects user experience and keeps cost acceptable, it is doing its job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 4: Shared Pools Create Efficiency, but They Also Create Coupling
&lt;/h2&gt;

&lt;p&gt;The capacity provider here is intentionally shared across platform agents and automation services. That is the right move early in a platform journey because it improves utilization and prevents every Lambda from inventing its own isolated infrastructure story.&lt;/p&gt;

&lt;p&gt;But shared pools always introduce two forms of coupling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Technical coupling, because multiple workloads compete for the same execution substrate&lt;/li&gt;
&lt;li&gt;Organizational coupling, because one team's deployment patterns can affect another team's cost and performance envelope&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why the concurrency controls here matter. The CLM router function uses a per-execution-environment concurrency setting, and the environment-specific &lt;code&gt;.tfvars&lt;/code&gt; files pin that concurrency to &lt;code&gt;4&lt;/code&gt;. That is more than a performance number. It is a fairness policy.&lt;/p&gt;

&lt;p&gt;If I were advising a platform team scaling this pattern, I would say this clearly: shared capacity providers are excellent, but they need quota thinking from day one. Otherwise the first successful workload becomes the first noisy neighbor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 5: If You Publish Versions Aggressively, You Need Lifecycle Hygiene on Day One
&lt;/h2&gt;

&lt;p&gt;This implementation makes another good call: the Lambda functions are published, aliased, and then cleaned up with an automated version pruner.&lt;/p&gt;

&lt;p&gt;That matters because version sprawl is one of those quiet operational problems that teams ignore until it becomes annoying enough to disrupt deployments. Published versions accumulate quickly when CI/CD is active. If you do not manage them, you eventually pay in clutter, confusion, or hard service limits.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;lambda_version_pruner&lt;/code&gt; implementation is stronger than a simplistic cleanup script because it preserves what actually matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It scans all Lambda functions&lt;/li&gt;
&lt;li&gt;It filters only functions associated with the target capacity provider&lt;/li&gt;
&lt;li&gt;It lists all aliases and protects aliased versions&lt;/li&gt;
&lt;li&gt;It keeps the latest N published versions&lt;/li&gt;
&lt;li&gt;It deletes everything older that is neither current nor aliased&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly the kind of automation mature teams invest in. Not glamorous. Very valuable.&lt;/p&gt;

&lt;p&gt;There is also an understated platform principle here: rollback is not just about keeping artifacts. It is about keeping the right artifacts. By preserving aliased versions, the pruner respects deployment intent rather than blindly optimizing for tidiness.&lt;/p&gt;

&lt;p&gt;There is also a more practical capacity-provider reason for doing this, and it deserves to be stated directly.&lt;/p&gt;

&lt;p&gt;When you run a shared Lambda Managed Instances pool, you want the platform to spend its effort on the versions that are actually serving traffic, warming correctly, or remaining available for safe rollback. If old published versions keep accumulating forever, three unhealthy things tend to happen:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;operators lose clarity on which versions are still meaningful&lt;/li&gt;
&lt;li&gt;rollback and alias management become noisier than they should be&lt;/li&gt;
&lt;li&gt;the shared platform carries more deployment residue than useful runtime intent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Strictly speaking, deleting old Lambda versions does not magically increase CPU on the capacity provider. What it does do is improve platform hygiene around the shared pool. It ensures that the versions attached to aliases, warmup patterns, and deployment workflows remain deliberate and limited. In other words, it improves capacity-provider utilization indirectly by reducing version sprawl around the workloads that consume that shared capacity.&lt;/p&gt;

&lt;p&gt;That matters in real operations. The healthier the deployment surface is, the easier it is to reason about what is warming, what is active, what can be rolled back, and what should no longer influence the platform at all.&lt;/p&gt;

&lt;p&gt;So the version pruner is not just a cleanup utility. It is part of making the shared capacity provider operationally efficient. Not by adding raw compute, but by reducing noise, protecting the versions that matter, and keeping the platform focused on live execution paths instead of historical leftovers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 6: GitHub Actions Should Orchestrate. CodeBuild Should Execute.
&lt;/h2&gt;

&lt;p&gt;Architecturally, the CI/CD model here is sensible.&lt;/p&gt;

&lt;p&gt;GitHub Actions is used as the control plane:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;branch-based triggering&lt;/li&gt;
&lt;li&gt;security scanning&lt;/li&gt;
&lt;li&gt;environment selection&lt;/li&gt;
&lt;li&gt;AWS credential injection&lt;/li&gt;
&lt;li&gt;build orchestration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS CodeBuild is used as the execution plane:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Terraform install&lt;/li&gt;
&lt;li&gt;&lt;code&gt;terraform init&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;terraform validate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;terraform plan&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;terraform apply&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I like this split. It keeps GitHub Actions lightweight and makes AWS the place where the actual infrastructure mutation happens. That usually gives better access control, cleaner auditability, and fewer surprises around long-running plan or apply steps.&lt;/p&gt;

&lt;p&gt;The buildspecs pin Terraform &lt;code&gt;1.12.2&lt;/code&gt;, install the CLI explicitly, and then execute plan/apply flows with environment-specific variable files. That is exactly the kind of boring repeatability you want in infrastructure delivery.&lt;/p&gt;

&lt;p&gt;This is one of the most practical lessons from the implementation: do not force GitHub Actions to be your full deployment runtime if AWS-native execution gives you better control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 7: CI/CD Maturity Is Not About Having a Pipeline. It Is About Where the Gates Actually Are.
&lt;/h2&gt;

&lt;p&gt;The implementation also reveals a harder truth: CI/CD design is won or lost not by YAML volume, but by trigger discipline.&lt;/p&gt;

&lt;p&gt;There are some good instincts here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dev deployment is chained off a successful security workflow&lt;/li&gt;
&lt;li&gt;Security scanning runs on push and PR for &lt;code&gt;dev&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;PR security review is scoped only to actual code and infrastructure changes&lt;/li&gt;
&lt;li&gt;Environment-specific secrets are used for AWS access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That said, the current implementation also shows the kinds of issues every fast-moving team encounters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The dev deploy workflow is triggered by &lt;code&gt;Security Checks (Push)&lt;/code&gt;, not by a broader quality gate such as tests plus security plus static analysis&lt;/li&gt;
&lt;li&gt;The QA workflow is currently triggered on &lt;code&gt;pull_request&lt;/code&gt; to &lt;code&gt;qa&lt;/code&gt;, yet it also includes an apply stage, which is a risky combination&lt;/li&gt;
&lt;li&gt;The sanity workflow references a different CodeBuild project naming pattern, which looks like copy-forward drift from another implementation&lt;/li&gt;
&lt;li&gt;One dev apply step mixes generic and environment-specific secrets in a way that deserves tightening&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a criticism of the team. It is actually the most authentic part of the system.&lt;/p&gt;

&lt;p&gt;Real pipelines evolve through reuse, renaming, urgency, and partial migration. The useful engineering habit is not pretending they are pristine. It is recognizing that pipeline drift is itself a production concern.&lt;/p&gt;

&lt;p&gt;My blunt lesson here is this: CI/CD is software. It needs the same review rigor as application code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 8: Documentation Drift Is a Reliability Signal
&lt;/h2&gt;

&lt;p&gt;The README here is ambitious and useful, but parts of it clearly describe a broader or earlier architecture than the exact files currently present. That mismatch is more important than most teams realize.&lt;/p&gt;

&lt;p&gt;When documentation and implementation diverge, three things happen:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;new engineers learn the wrong system&lt;/li&gt;
&lt;li&gt;reviewers approve changes with outdated mental models&lt;/li&gt;
&lt;li&gt;incidents take longer to resolve because operators trust stale diagrams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of the best engineering habits is to treat documentation drift as an operational bug, not as a cosmetic issue.&lt;/p&gt;

&lt;p&gt;This implementation makes that case well. The code is the source of truth. The docs are directionally strong, but some names, workflow descriptions, and file references have clearly moved over time. That is normal. What matters is catching it before the next engineer builds decisions on old assumptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 9: The Default VPC Is Fine for Speed, but It Should Be a Conscious Temporary Convenience
&lt;/h2&gt;

&lt;p&gt;The Terraform intentionally uses the default VPC and default subnets, then layers in filtering and a custom security group. For early velocity, that is an acceptable choice. It removes friction and makes the first deployment much easier.&lt;/p&gt;

&lt;p&gt;But teams should be honest about the tradeoff.&lt;/p&gt;

&lt;p&gt;Using the default VPC accelerates setup. It does not provide the same clarity, segmentation, or policy hygiene that a dedicated workload VPC eventually should. The inbound HTTPS rule from &lt;code&gt;0.0.0.0/0&lt;/code&gt; is another example of where a practical early-stage decision should later be revisited with a more opinionated security posture.&lt;/p&gt;

&lt;p&gt;My view is simple: default VPC usage is fine when it is a speed decision. It becomes dangerous when it silently hardens into architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 10: Least Privilege Usually Loses the First Battle. Do Not Let It Lose the War.
&lt;/h2&gt;

&lt;p&gt;The Lambda IAM policy for the router function is broad. Very broad.&lt;/p&gt;

&lt;p&gt;That is common when a platform team is trying to unblock integration work quickly across S3, SQS, SNS, DynamoDB, Bedrock, AppSync, logs, X-Ray, and secrets. The version pruner is noticeably tighter, which is encouraging. But the broader pattern remains familiar: the first version of a system usually over-grants.&lt;/p&gt;

&lt;p&gt;The lesson is not "never do that." The lesson is "know when you are doing it, and schedule the hardening work while the platform is still comprehensible."&lt;/p&gt;

&lt;p&gt;Security debt compounds. The longer a wide-open policy survives, the more invisible it becomes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Repo Gets Right
&lt;/h2&gt;

&lt;p&gt;If I strip away the drift and focus on the platform instincts, this implementation gets a lot right:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It treats capacity provider infrastructure as shared platform capability, not one-off function plumbing&lt;/li&gt;
&lt;li&gt;It optimizes for ARM64 economics instead of defaulting to x86 out of habit&lt;/li&gt;
&lt;li&gt;It acknowledges cold starts as a business problem and addresses them operationally&lt;/li&gt;
&lt;li&gt;It preserves rollback safety with aliases while still pruning version sprawl&lt;/li&gt;
&lt;li&gt;It separates orchestration from execution in CI/CD&lt;/li&gt;
&lt;li&gt;It encodes AWS service constraints in Terraform comments and defaults, which reduces tribal knowledge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a strong foundation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Would Improve Next
&lt;/h2&gt;

&lt;p&gt;If I were turning this into the next version of a production-grade internal platform, I would prioritize the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Tighten naming consistency across the implementation.&lt;br&gt;
The capacity provider name appears in slightly different forms across resources. That is how automation misses its target. Shared naming locals should eliminate this class of error.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Make QA and production promotion rules stricter.&lt;br&gt;
A PR-triggered apply path should be removed. Plan on PR, apply on protected branch or approved environment gate is the cleaner model.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run Terraform from a single explicit working directory.&lt;br&gt;
The current layout places Terraform under &lt;code&gt;terraform_file/&lt;/code&gt;, while some buildspec commands read like root-level execution. That ambiguity should be eliminated.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Move from broad IAM toward intent-based policies.&lt;br&gt;
Especially for the router Lambda, policy scope should narrow as the workload stabilizes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Revisit networking posture.&lt;br&gt;
The default VPC is fine for speed; a dedicated VPC model is better for longevity, auditability, and controlled ingress.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add stronger deployment quality gates.&lt;br&gt;
Security review is useful, but infrastructure promotion should also hang off validation, tests, linting, and explicit approval where appropriate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add platform observability as code.&lt;br&gt;
CloudWatch alarms, dashboarding, and cost visibility for the capacity provider should be treated as first-class Terraform resources, not follow-up tasks.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Bigger Technical Lesson
&lt;/h2&gt;

&lt;p&gt;The biggest takeaway from this implementation is not about Lambda specifically.&lt;/p&gt;

&lt;p&gt;It is about how modern platform teams should build.&lt;/p&gt;

&lt;p&gt;We should absolutely chase better cost-performance curves. We should use managed primitives aggressively. We should automate the boring work. But we also need the discipline to encode what we learn while the system is still small enough to reason about.&lt;/p&gt;

&lt;p&gt;What makes this useful is that it shows both halves of real engineering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the architectural intent&lt;/li&gt;
&lt;li&gt;the implementation scars&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination is where credible engineering judgment comes from.&lt;/p&gt;

&lt;p&gt;Anyone can present a clean target state. The harder and more useful skill is building systems that survive contact with deployment friction, service constraints, naming drift, and operational reality.&lt;/p&gt;

&lt;p&gt;That is what this implementation is doing. And that is why the lessons here matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Thought
&lt;/h2&gt;

&lt;p&gt;Capacity providers, warmers, version pruning, and GitHub-driven delivery are not separate topics. They are all answers to the same technical question:&lt;/p&gt;

&lt;p&gt;How do we make cloud systems faster, cheaper, safer, and more repeatable without turning every application team into a specialized infrastructure group?&lt;/p&gt;

&lt;p&gt;In this implementation, the answer was to centralize the hard platform decisions, automate the hygiene, keep the runtime warm where it matters, and stay honest about the places where the system still needs tightening.&lt;/p&gt;

&lt;p&gt;That is not just good infrastructure work.&lt;/p&gt;

&lt;p&gt;That is good engineering practice.&lt;/p&gt;

</description>
      <category>serverless</category>
      <category>lambda</category>
      <category>aws</category>
    </item>
    <item>
      <title>Lessons I learned building a memory-aware agent with Amazon Bedrock AgentCore Runtime</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 20 Apr 2026 18:10:16 +0000</pubDate>
      <link>https://dev.to/amitkayal/lessons-i-learned-building-a-memory-aware-agent-with-amazon-bedrock-agentcore-runtime-4lc9</link>
      <guid>https://dev.to/amitkayal/lessons-i-learned-building-a-memory-aware-agent-with-amazon-bedrock-agentcore-runtime-4lc9</guid>
      <description>&lt;h1&gt;
  
  
  Lessons I learned building a memory-aware agent with Amazon Bedrock AgentCore Runtime
&lt;/h1&gt;

&lt;p&gt;When I started building an agent with Amazon Bedrock AgentCore Runtime, I thought the difficult parts would be model selection, tool wiring, and deployment. Those certainly mattered, but the part that shaped the quality of the agent most was memory.&lt;/p&gt;

&lt;p&gt;The first version of the agent could answer single prompts well enough, but it did not behave like a real multi-turn system. Follow-up questions were brittle. The agent lost short-range intent. Tool usage worked, but only within the narrow boundaries of the current prompt. As soon as the conversation depended on what happened one or two turns earlier, the system started to feel less like an agent and more like a stateless inference endpoint.&lt;/p&gt;

&lt;p&gt;That experience changed how I approached the design. I stopped thinking about memory as a convenience feature and started treating it as part of the runtime architecture itself. This article is a distillation of the most important lessons I learned while building a short-term-memory-aware agent with Amazon Bedrock AgentCore Runtime and Strands.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 1: An agent is not really multi-turn until memory is part of the lifecycle
&lt;/h2&gt;

&lt;p&gt;One of the first things I learned is that conversational continuity does not emerge automatically just because the application calls the same runtime repeatedly.&lt;/p&gt;

&lt;p&gt;Without short-term memory, the agent only sees the current prompt unless the application keeps reconstructing and replaying history manually. That creates several problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;previous instructions are easy to lose,&lt;/li&gt;
&lt;li&gt;tool chains become fragile across turns,&lt;/li&gt;
&lt;li&gt;users have to restate identifiers and intent,&lt;/li&gt;
&lt;li&gt;the system becomes increasingly prompt-shaped rather than interaction-shaped.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What became clear to me is that short-term memory is not about storing everything forever. It is about preserving enough recent state for the current conversation to remain coherent.&lt;/p&gt;

&lt;p&gt;That distinction matters. I was not trying to build a knowledge base or semantic fact store. I was trying to answer a simpler question: how do I help the agent remember what we were just doing?&lt;/p&gt;

&lt;p&gt;Once I framed the problem that way, the architecture became much clearer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 2: The cleanest pattern is explicit memory, not implicit transcript magic
&lt;/h2&gt;

&lt;p&gt;Another lesson I learned quickly is that I did not want memory to be hidden behind vague runtime behavior. I wanted the agent code to make memory use explicit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;where memory comes from,&lt;/li&gt;
&lt;li&gt;when it is read,&lt;/li&gt;
&lt;li&gt;when it is written,&lt;/li&gt;
&lt;li&gt;which user it belongs to,&lt;/li&gt;
&lt;li&gt;which conversation it belongs to.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That led me to a pattern built around &lt;code&gt;MemoryClient&lt;/code&gt; and hooks.&lt;/p&gt;

&lt;p&gt;Instead of treating memory like a passive transcript that somehow appears at the edge of the request, I found it much more reliable to think about it as a lifecycle-managed dependency:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;create a short-term memory resource,&lt;/li&gt;
&lt;li&gt;pass the memory identity into the runtime,&lt;/li&gt;
&lt;li&gt;read recent turns when the agent initializes,&lt;/li&gt;
&lt;li&gt;write new messages as events when the conversation changes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The important shift for me was this: memory worked best when it was part of the agent object model, not just part of request handling glue code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 3: Hooks are where memory belongs
&lt;/h2&gt;

&lt;p&gt;This was probably the biggest implementation insight.&lt;/p&gt;

&lt;p&gt;Once I had a Strands-based agent running inside AgentCore Runtime, I needed to decide where the memory logic should live. I could have put everything directly into the entrypoint and manually stitched together request parsing, history retrieval, message persistence, and prompt injection. That would have worked, but it would have made the agent lifecycle harder to reason about.&lt;/p&gt;

&lt;p&gt;What worked better was using hooks tied to the agent lifecycle itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;AgentInitializedEvent&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;MessageAddedEvent&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That structure gave me a much cleaner mental model.&lt;/p&gt;

&lt;p&gt;On initialization, the agent needs context before it reasons. That is the right moment to retrieve the most recent turns from memory and inject them into prompt context.&lt;/p&gt;

&lt;p&gt;When a new message is added, the conversation state has changed. That is the right moment to persist the latest user or assistant message back into memory.&lt;/p&gt;

&lt;p&gt;The core interaction looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;recent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_last_k_turns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;memory_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What I like about this model is that it is deterministic.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;memory load happens before reasoning,&lt;/li&gt;
&lt;li&gt;memory write happens when conversation state changes,&lt;/li&gt;
&lt;li&gt;both operations use the same identity boundaries,&lt;/li&gt;
&lt;li&gt;the entrypoint stays focused on request extraction rather than conversation orchestration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That made the system easier to debug, easier to extend, and much easier to explain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 4: Identity is the real memory boundary
&lt;/h2&gt;

&lt;p&gt;Before building this, I thought of memory mostly as a storage problem. In practice, I learned it is just as much an identity problem.&lt;/p&gt;

&lt;p&gt;The two identifiers that mattered most were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;actor_id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;session_id&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation ended up being foundational.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why &lt;code&gt;actor_id&lt;/code&gt; matters
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;actor_id&lt;/code&gt; is the user boundary. If that identifier is unstable, absent, or inconsistent, memory quality degrades immediately.&lt;/p&gt;

&lt;p&gt;What I learned is that a memory system is only as good as the application identity you feed into it. If the same user appears under multiple IDs, the agent cannot retrieve a coherent conversational history. If two users are accidentally mapped to the same identity, memory becomes unsafe.&lt;/p&gt;

&lt;p&gt;So one of my strongest takeaways is that &lt;code&gt;actor_id&lt;/code&gt; should always come from a stable authenticated user identity, not from an incidental client-generated value.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why &lt;code&gt;session_id&lt;/code&gt; matters
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;session_id&lt;/code&gt; turned out to be just as important. A single user does not have just one conversation. They may have multiple active threads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one troubleshooting flow,&lt;/li&gt;
&lt;li&gt;one transcript analysis request,&lt;/li&gt;
&lt;li&gt;one abandoned conversation from earlier,&lt;/li&gt;
&lt;li&gt;one brand-new task.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without a session boundary, all of that collapses into one memory stream. The agent might technically “remember,” but it remembers too much of the wrong thing.&lt;/p&gt;

&lt;p&gt;That was a key lesson for me: useful memory is not just preserved memory. It is correctly scoped memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 5: The agent should be rebuilt per request, but memory should persist across requests
&lt;/h2&gt;

&lt;p&gt;This was an architectural point that became clearer as I implemented the runtime flow.&lt;/p&gt;

&lt;p&gt;The Strands agent instance itself is created per request. That makes sense because each invocation carries request-specific state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the current user prompt,&lt;/li&gt;
&lt;li&gt;the active user identity,&lt;/li&gt;
&lt;li&gt;the active conversation session,&lt;/li&gt;
&lt;li&gt;the active tool and runtime context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But memory should not behave like request-local state. Memory has to outlive the agent instance and remain keyed to the same user and conversation across invocations.&lt;/p&gt;

&lt;p&gt;That split was important for me to internalize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;agent instance lifecycle is short,&lt;/li&gt;
&lt;li&gt;conversation memory lifecycle is longer,&lt;/li&gt;
&lt;li&gt;the link between them is established through state and hooks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once I started thinking in those terms, the design felt much more natural.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 6: Deployment is part of the memory design
&lt;/h2&gt;

&lt;p&gt;I originally thought of deployment as a separate concern from conversational behavior. Building this agent convinced me that the two are tightly connected.&lt;/p&gt;

&lt;p&gt;The runtime needs to know which memory resource it should use, but I did not want that decision hardcoded in application logic. The better pattern was to resolve the correct memory resource during deployment and pass that identity into the runtime as configuration.&lt;/p&gt;

&lt;p&gt;In practice, that meant the runtime received environment-specific values such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AGENT_NAME=&amp;lt;agent-name&amp;gt;
MEMORY_ID=&amp;lt;memory-id&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gave me a few benefits immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the same application code could move across environments,&lt;/li&gt;
&lt;li&gt;memory resources stayed aligned with environment boundaries,&lt;/li&gt;
&lt;li&gt;the runtime remained configurable without source changes,&lt;/li&gt;
&lt;li&gt;the control plane remained the primary place where resource binding happened.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of the clearest lessons here is that memory should be treated like any other environment-bound infrastructure dependency. If it is not part of deployment, it tends to become a hidden assumption.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 7: Short-term memory and long-term memory solve different problems
&lt;/h2&gt;

&lt;p&gt;I found it helpful to stop using the word “memory” as if it meant one thing.&lt;/p&gt;

&lt;p&gt;Short-term memory answered the question:&lt;/p&gt;

&lt;p&gt;"What was happening in this conversation recently?"&lt;/p&gt;

&lt;p&gt;Long-term memory answers a different question:&lt;/p&gt;

&lt;p&gt;"What durable information should the system remember beyond this immediate interaction?"&lt;/p&gt;

&lt;p&gt;For the agent I was building, the short-term problem came first. I needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;recent-turn continuity,&lt;/li&gt;
&lt;li&gt;bounded replay,&lt;/li&gt;
&lt;li&gt;session-scoped context,&lt;/li&gt;
&lt;li&gt;predictable event retention.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I did not need semantic fact retrieval in the first phase. I did not need vector search for historical knowledge. I needed the agent to remain coherent across adjacent turns.&lt;/p&gt;

&lt;p&gt;That was an important design simplification. It kept the first version of the memory architecture focused on event continuity instead of overextending into knowledge retrieval prematurely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 8: Recent-turn replay should be bounded
&lt;/h2&gt;

&lt;p&gt;Once I had memory retrieval working, the next question was how much of it to inject back into the agent context.&lt;/p&gt;

&lt;p&gt;My lesson here was simple: more memory is not always better memory.&lt;/p&gt;

&lt;p&gt;If too much prior conversation is replayed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt size grows,&lt;/li&gt;
&lt;li&gt;token cost grows,&lt;/li&gt;
&lt;li&gt;stale context starts competing with the current task,&lt;/li&gt;
&lt;li&gt;reasoning quality can actually decline.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I found the most practical pattern was to retrieve the last few turns and inject them into prompt context in a compact representation. In this design, that replay window was bounded at five turns.&lt;/p&gt;

&lt;p&gt;That gave me a good balance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;enough recent context for continuity,&lt;/li&gt;
&lt;li&gt;small enough context for predictable prompt growth,&lt;/li&gt;
&lt;li&gt;simple enough formatting to inspect and debug.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This also reinforced another lesson: short-term memory should be operationally understandable. I want to know what context the model saw, not just trust that some opaque memory layer handled it correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 9: Memory becomes more valuable when tools are involved
&lt;/h2&gt;

&lt;p&gt;The agent I built was not just a conversational shell. It had tools, including domain-specific behavior such as transcript retrieval and AWS interactions.&lt;/p&gt;

&lt;p&gt;That is where the value of short-term memory became even more obvious.&lt;/p&gt;

&lt;p&gt;In a tool-using workflow, the user often does not repeat the full context every turn. They say things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"use the same meeting"&lt;/li&gt;
&lt;li&gt;"what did the second speaker say?"&lt;/li&gt;
&lt;li&gt;"now summarize that"&lt;/li&gt;
&lt;li&gt;"check the S3 output from before"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without memory, the agent has to reconstruct working state from a single prompt. With memory, the agent has a much better chance of preserving:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the active object under discussion,&lt;/li&gt;
&lt;li&gt;the prior user instruction,&lt;/li&gt;
&lt;li&gt;the last tool result,&lt;/li&gt;
&lt;li&gt;the intended next step.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One of my strongest takeaways is that memory is not just a conversational improvement. It is a workflow improvement. It makes tool orchestration across turns materially more coherent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 10: Failure modes need to be designed, not discovered in production
&lt;/h2&gt;

&lt;p&gt;Building this also made me think much more carefully about degraded behavior.&lt;/p&gt;

&lt;p&gt;If memory resolution fails and the runtime cannot find a memory resource, the agent may still run. That sounds convenient, but it also means the system may silently shift from stateful to stateless behavior.&lt;/p&gt;

&lt;p&gt;That taught me to treat the following as first-class operational conditions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;memory enabled,&lt;/li&gt;
&lt;li&gt;memory disabled,&lt;/li&gt;
&lt;li&gt;memory load succeeded,&lt;/li&gt;
&lt;li&gt;memory write succeeded,&lt;/li&gt;
&lt;li&gt;memory resolution failed,&lt;/li&gt;
&lt;li&gt;identity inputs were missing or malformed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The same thing applies to identity mistakes.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;actor_id&lt;/code&gt; is unstable, memory becomes fragmented.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;session_id&lt;/code&gt; is reused incorrectly, unrelated conversations bleed into each other.&lt;/p&gt;

&lt;p&gt;If replay windows grow without discipline, prompt quality degrades.&lt;/p&gt;

&lt;p&gt;These are not edge cases. They are part of the normal operating surface of a memory-aware agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 11: Retention, privacy, and compliance show up earlier than expected
&lt;/h2&gt;

&lt;p&gt;Short-term memory sounds lightweight, but it is still stored interaction data.&lt;/p&gt;

&lt;p&gt;That means retention policy is not just a platform setting. It is part of the product design. While building this, I became much more aware that memory decisions quickly intersect with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;data handling policy,&lt;/li&gt;
&lt;li&gt;privacy expectations,&lt;/li&gt;
&lt;li&gt;deletion and retention requirements,&lt;/li&gt;
&lt;li&gt;security review,&lt;/li&gt;
&lt;li&gt;production observability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The technical implementation can be elegant, but if these operational questions are not addressed early, the design will be incomplete.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lesson 12: AgentCore became more useful to me when I treated it as a runtime system, not just a hosting target
&lt;/h2&gt;

&lt;p&gt;This may be the broadest lesson of all.&lt;/p&gt;

&lt;p&gt;At first, I thought of AgentCore Runtime mainly as the place where the agent container would run. But while building with memory, I started appreciating it more as a runtime environment with clear operational boundaries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the runtime executes the agent,&lt;/li&gt;
&lt;li&gt;the framework manages reasoning and tools,&lt;/li&gt;
&lt;li&gt;the memory plane manages event continuity,&lt;/li&gt;
&lt;li&gt;the deployment workflow binds the right resources together.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That view helped me move beyond “deploy a model wrapper in a container” toward “operate an agent system with state, identity, and lifecycle.”&lt;/p&gt;

&lt;p&gt;For me, that was the real shift.&lt;/p&gt;

&lt;h2&gt;
  
  
  The technical pattern I would reuse
&lt;/h2&gt;

&lt;p&gt;If I were building the same class of agent again, I would reuse the same high-level pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a dedicated short-term memory resource.&lt;/li&gt;
&lt;li&gt;Resolve the correct memory resource during deployment.&lt;/li&gt;
&lt;li&gt;Pass memory identity into the runtime explicitly.&lt;/li&gt;
&lt;li&gt;Build the agent per request with user and session state.&lt;/li&gt;
&lt;li&gt;Load recent turns during agent initialization.&lt;/li&gt;
&lt;li&gt;Persist new messages when they are added.&lt;/li&gt;
&lt;li&gt;Keep replay windows bounded.&lt;/li&gt;
&lt;li&gt;Treat &lt;code&gt;actor_id&lt;/code&gt; and &lt;code&gt;session_id&lt;/code&gt; as core correctness boundaries.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I would also keep the same mental model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;short-term memory is for continuity,&lt;/li&gt;
&lt;li&gt;long-term memory is for durable recall,&lt;/li&gt;
&lt;li&gt;hooks are the right place for memory orchestration,&lt;/li&gt;
&lt;li&gt;deployment is part of memory architecture,&lt;/li&gt;
&lt;li&gt;observability should make degraded memory behavior visible.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;The biggest lesson I learned while building with Amazon Bedrock AgentCore Runtime is that memory is not something you sprinkle onto an agent once the rest of the system works. Memory changes the shape of the system.&lt;/p&gt;

&lt;p&gt;It affects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;request lifecycle,&lt;/li&gt;
&lt;li&gt;identity boundaries,&lt;/li&gt;
&lt;li&gt;prompt construction,&lt;/li&gt;
&lt;li&gt;deployment,&lt;/li&gt;
&lt;li&gt;observability,&lt;/li&gt;
&lt;li&gt;privacy,&lt;/li&gt;
&lt;li&gt;and tool coherence across turns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once I accepted that, the architecture became much more disciplined. The agent became easier to reason about, easier to operate, and much more capable in real multi-turn interactions.&lt;/p&gt;

&lt;p&gt;That is the lesson I would carry into any future AgentCore build: if the experience is meant to feel conversational, memory has to be designed as a first-class runtime concern from the beginning.&lt;/p&gt;

</description>
      <category>agentcore</category>
      <category>aws</category>
      <category>serverless</category>
      <category>agentskills</category>
    </item>
    <item>
      <title>API Gateway as Websocket</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Tue, 21 Jan 2025 07:49:42 +0000</pubDate>
      <link>https://dev.to/amitkayal/api-gateway-as-websocket-5eee</link>
      <guid>https://dev.to/amitkayal/api-gateway-as-websocket-5eee</guid>
      <description>&lt;h1&gt;
  
  
  API Gateway as websocket
&lt;/h1&gt;

&lt;h2&gt;
  
  
  API Gateway as WS Components
&lt;/h2&gt;

&lt;p&gt;Websocket provides bidirectional session aware communication between caller and receiver and a crucial component for realtime application.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Setup API Gateway for WebSocket&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a WebSocket API in the Amazon API Gateway console or through IAC.&lt;/li&gt;
&lt;li&gt;Define the WebSocket API route selection expression. Routes here are simply like a bridge to connections e.g., 

&lt;ul&gt;
&lt;li&gt;$request.body.action.&lt;/li&gt;
&lt;li&gt;Define the following WebSocket routes:&lt;/li&gt;
&lt;li&gt;$connect: Triggered when a client establishes a connection.&lt;/li&gt;
&lt;li&gt;$disconnect: Triggered when a client disconnects.&lt;/li&gt;
&lt;li&gt;Custom routes, e.g., sendMessage, to handle specific actions.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Create an Integration with AWS Lambda&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For each route ($connect, $disconnect, custom routes), integrate a Lambda function to handle the respective logic.&lt;/li&gt;
&lt;li&gt;Use the Lambda function's handler to process:

&lt;ul&gt;
&lt;li&gt;$connect: Store the connection in DynamoDB.&lt;/li&gt;
&lt;li&gt;$disconnect: Remove the connection from DynamoDB.&lt;/li&gt;
&lt;li&gt;Custom routes: Process the message and forward it to SQS.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;DynamoDB for Connection Management&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a DynamoDB table to store:

&lt;ul&gt;
&lt;li&gt;Connection ID (Primary Key).&lt;/li&gt;
&lt;li&gt;Session ID or other metadata for grouping connections.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;This table allows tracking active WebSocket connections for broadcasting messages.&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Configure SQS for Message Queue&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use an SQS FIFO queue for guaranteed order and deduplication.&lt;/li&gt;
&lt;li&gt;Messages processed in Lambda (custom routes) are sent to SQS for downstream services.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;IAM Roles and Permissions&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Assign an IAM role to the API Gateway to invoke the integrated Lambda functions.&lt;/li&gt;
&lt;li&gt;Grant Lambda permissions to read/write from DynamoDB and send messages to SQS.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Client Connection and Messaging&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use WebSocket-compatible libraries (e.g., ws in Node.js or WebSocket API in browsers) to:&lt;/li&gt;
&lt;li&gt;Establish a WebSocket connection to the API Gateway endpoint.&lt;/li&gt;
&lt;li&gt;Send and receive messages using the WebSocket protocol.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture of Websocket mechanism
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;WebSocket Client:

&lt;ul&gt;
&lt;li&gt;Initiates WebSocket connection and communicates via send() and onmessage().&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;API Gateway (WebSocket API):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manages WebSocket connections and invokes Lambda functions for defined routes.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Route Integration (Lambda Functions):&lt;br&gt;
Every route should have an integration. There are 3 types — Mock, HTTP and Lambda.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$connect: Adds connection metadata to DynamoDB.&lt;/li&gt;
&lt;li&gt;$disconnect: Removes connection metadata from DynamoDB.&lt;/li&gt;
&lt;li&gt;$default route: selected when route cant be evaluated against message&lt;/li&gt;
&lt;li&gt;Custom Routes: Processes messages to invoke integration based on message content and forwards them to SQS.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;DynamoDB:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maintains active connection records, including connectionId and associated metadata.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;SQS FIFO Queue:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Queues messages for downstream processing, ensuring delivery order and deduplication.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Downstream Services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Processes messages from SQS and performs actions like notifications, data updates, or storage.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Security
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Authentication and Authorization
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Custom Authorizer (Lambda Authorizer)&lt;br&gt;
It can only be used for the $connect route.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a Lambda Authorizer to validate custom tokens or headers sent during connection attempts.&lt;/li&gt;
&lt;li&gt;Example:

&lt;ul&gt;
&lt;li&gt;Validate a JWT token from an identity provider (e.g., Cognito, Auth0).&lt;/li&gt;
&lt;li&gt;Check the token against allowed users or roles.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Amazon Cognito:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use Amazon Cognito for user authentication.&lt;/li&gt;
&lt;li&gt;Configure API Gateway to use Cognito to validate tokens in connection requests.&lt;/li&gt;
&lt;li&gt;Best suited for applications with user pools.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Secure WebSocket Connections
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Always use the secure WebSocket protocol (wss://). API Gateway enforces HTTPS/TLS, ensuring encrypted communication.&lt;/li&gt;
&lt;li&gt;Associate a custom domain with API Gateway WebSocket endpoint. We should AWS Certificate Manager (ACM) to manage SSL/TLS certificates.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  IP Whitelisting and Blacklisting
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; IP Whitelisting and Blacklisting: We should Attach AWS WAF to API Gateway and Block/allow requests based on IP addresses or CIDR ranges. we should also use rate limit to protect from DDoS attack
### API Gateway Throttling&lt;/li&gt;
&lt;li&gt;We can Set rate and burst limits on API Gateway routes to limit the number of connections per client.&lt;/li&gt;
&lt;li&gt;We can create API keys and associate them with usage plan and then we Limit the number of allowed requests per API key&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Environment-based Access Control:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;We should always use distinct stages (e.g., dev, prod) and restrict connections to the production API through IP rules.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Tools to test
&lt;/h2&gt;

&lt;p&gt;There are following tools which we can explore to test websocket.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Piesocket&lt;/li&gt;
&lt;li&gt;Postman&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>apigateway</category>
      <category>api</category>
    </item>
    <item>
      <title>S3 table &amp; S3 Metadata table</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Mon, 09 Dec 2024 18:26:23 +0000</pubDate>
      <link>https://dev.to/aws-builders/s3-table-s3-metadata-table-91i</link>
      <guid>https://dev.to/aws-builders/s3-table-s3-metadata-table-91i</guid>
      <description>&lt;h2&gt;
  
  
  Open table format and its architecture
&lt;/h2&gt;

&lt;p&gt;OpenTable formats, such as Apache Iceberg, Apache Hudi, and Delta Lake, have gained popularity in the data analytics mainly because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ACID Transactions: OpenTable formats (e.g., Apache Iceberg, Delta Lake) ensure reliable and consistent data updates, even with concurrent access.&lt;/li&gt;
&lt;li&gt;Schema Evolution: They allow seamless updates to schemas without disrupting existing pipelines, simplifying data management. metadata tracks the changes to the dataset. The files held in the Data layer are captured by the metadata files held in the Metadata layer. As the files change, the metadata files attached to them track these changes.&lt;/li&gt;
&lt;li&gt;Optimized Queries: Partitioning and indexing enable faster queries by scanning only relevant data, improving performance and cost-efficiency.&lt;/li&gt;
&lt;li&gt;Time Travel: Users can access historical versions of data for debugging, compliance, or analytics.&lt;/li&gt;
&lt;li&gt;Interoperability: These formats integrate seamlessly with big data tools like Spark, Flink, and Presto, making them versatile and widely adopted.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Open file format
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkl9mm5r6t0aqp4uy7dqa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkl9mm5r6t0aqp4uy7dqa.png" alt="img" width="750" height="588"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  S3 table
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;p&gt;Amazon S3 Table is optimized for analytics workloads. It is designed to continuously enhance query performance and reduce storage costs for tabular data. This solution looks very promising if you are working with LakeHouse architecture. It’s a new type of bucket that organizes tables as sub-resources.&lt;br&gt;
&lt;strong&gt;A new bucket type s3 table has been introduced to support this. As liked any other aws resoyrce, it has ARN, can take resource policy and as an unique feature it has dedicated endpoint.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;S3 Tables are intended explicitly for storing data in a tabular format, such as daily purchase transactions, streaming sensor data, or ad impressions. This data is organized into columns and rows like a database table.&lt;/li&gt;
&lt;li&gt;Table buckets support storing tables in the Apache Iceberg format. You can query these tables using standard SQL in query engines that support Iceberg.&lt;/li&gt;
&lt;li&gt;Read/write allowed on datafiles and metadata files. Delete and update not allowed to save data integrity.&lt;/li&gt;
&lt;li&gt;Compatible query engines include Amazon Athena, Amazon Redshift, and Apache Spark.&lt;/li&gt;
&lt;li&gt;S3 Table automatically performs maintenance tasks like compaction and snapshot management to optimize your tables for querying, including removing unreferenced files.&lt;/li&gt;
&lt;li&gt;S3 Table offers access management for both table and bucket&lt;/li&gt;
&lt;li&gt;Fully managed apache icebarg tables in S3&lt;/li&gt;
&lt;li&gt;It supports automatic compaction of underlying files to improve query performance and tune then further for better latency.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  S3 Table buckets namespace
&lt;/h3&gt;

&lt;p&gt;Namespace logically groups related s3 table together and thus allowing us to have greater control based on namespace of s3 tables. It helps us for following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;logical segmentation of data and multi tenancy

&lt;ul&gt;
&lt;li&gt;supporting of multi tenancy by having separate namespace. Supports compliance with data isolation requirements in regulated industries.&lt;/li&gt;
&lt;li&gt;separate tables based on application, project etc&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;prevent naming conflicts

&lt;ul&gt;
&lt;li&gt;Each namespace acts like a "container," allowing tables with the same name in different namespaces without conflicts.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Better Access Control

&lt;ul&gt;
&lt;li&gt;Policies can grant or restrict access to specific namespaces, ensuring data security and compliance.  It also reduces the risk of unauthorized access to unrelated tables in the same bucket.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Easy data management

&lt;ul&gt;
&lt;li&gt;Makes our life easier to query, update, or delete related tables in bulk.&lt;/li&gt;
&lt;li&gt;Makes easy metadata management for tables grouped under a namespace.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Advanced workflows based on namespace

&lt;ul&gt;
&lt;li&gt;It helps to simplify automation for data pipelines or real-time analytics applications.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  S3 table opertaion &amp;amp; management
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Table Operation&lt;/strong&gt;&lt;br&gt;
They are quite similar to CRUD operation.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;list tables&lt;/li&gt;
&lt;li&gt;create tables&lt;/li&gt;
&lt;li&gt;Get table metadata location&lt;/li&gt;
&lt;li&gt;Update table metadata location&lt;/li&gt;
&lt;li&gt;Delete Table&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Table Management&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Put Table Policy&lt;/li&gt;
&lt;li&gt;Put Table Bucket Policy&lt;/li&gt;
&lt;li&gt;Put Table Maintenance Config&lt;/li&gt;
&lt;li&gt;Put Table Bucket Maintenance Config&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Policies related to S3 table operation
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Allow access to create and use table buckets
&lt;/h3&gt;

&lt;p&gt;Here Action Lists the specific actions the policy allows. &lt;/p&gt;

&lt;p&gt;These actions are S3 Tables-specific: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;s3tables:CreateTableBucket: Grants permission to create a table bucket in S3 Tables. &lt;/li&gt;
&lt;li&gt;s3tables:PutTableBucketPolicy: Allows setting or updating the bucket policy for a table bucket. &lt;/li&gt;
&lt;li&gt;s3tables:GetTableBucketPolicy: Allows retrieving the bucket policy associated with a table bucket. &lt;/li&gt;
&lt;li&gt;s3tables:ListTableBuckets: Allows listing all table buckets within the specified scope. &lt;/li&gt;
&lt;li&gt;&lt;p&gt;s3tables:GetTableBucket: Grants permission to access the metadata of a specific table bucket.&lt;br&gt;
Resource Defines the scope of the resources these actions can apply to. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;"arn:aws:s3tables:region:account_id:bucket/*": Specifies all table buckets in the account (account_id) and region (region). &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The * after bucket/ indicates that permissions apply to all buckets under this account and region.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "Version": "2012-10-17",
    "Statement": [{
        "Sid": "AllowBucketActions for user",
        "Effect": "Allow",
        "Action": [
            "s3tables:CreateTableBucket",
            "s3tables:PutTableBucketPolicy",
            "s3tables:GetTableBucketPolicy",
            "s3tables:ListTableBuckets",
            "s3tables:GetTableBucket"
        ],
        "Resource": "arn:aws:s3tables:region:account_id:bucket/*"
    }]
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Allow access to create and use tables in a table bucket
&lt;/h3&gt;

&lt;p&gt;Here Action Lists the specific actions allowed by the policy, related to S3 Tables. &lt;em&gt;Please note that The first policy focused on creating and managing table buckets and associated metadata, but it did not include granular operations like managing tables within namespaces. The first policy did not include actions such as creating tables, querying data, or updating metadata at the table level. These are the operations where namespaces become relevant.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;s3tables:CreateTable: Allows creating new tables in the specified table bucket. &lt;/li&gt;
&lt;li&gt;s3tables:PutTableData: Grants permission to write data to tables within the table bucket. &lt;/li&gt;
&lt;li&gt;s3tables:GetTableData: Allows reading data from tables in the bucket.&lt;/li&gt;
&lt;li&gt;s3tables:GetTableMetadataLocation: Allows retrieving metadata location information for a table.&lt;/li&gt;
&lt;li&gt;s3tables:UpdateTableMetadataLocation: Grants permission to update the metadata location of a table. &lt;/li&gt;
&lt;li&gt;s3tables:GetNamespace: Allows retrieving namespace information associated with the table bucket. &lt;/li&gt;
&lt;li&gt;s3tables:CreateNamespace: Grants permission to create namespaces for organizing table data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Resource section specifies&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Grants permissions on the bucket named amzn-s3-demo-table-bucket&lt;/li&gt;
&lt;li&gt;Grants permissions on all tables within the amzn-s3-demo-table-bucket
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
     "Version": "2012-10-17",
     "Statement": [ 
         {
             "Sid": "AllowBucketActions",
             "Effect": "Allow",
             "Action": [
                 "s3tables:CreateTable",
                 "s3tables:PutTableData",
                 "s3tables:GetTableData",
                 "s3tables:GetTableMetadataLocation",
                 "s3tables:UpdateTableMetadataLocation",
                 "s3tables:GetNamespace",
                 "s3tables:CreateNamespace"
             ],

             "Resource": [
               "arn:aws:s3tables:region:account_id:bucket/amzn-s3-demo-table-bucket",
               "arn:aws:s3tables:region:account_id:bucket/amzn-s3-demo-table-bucket/table/*"
            ]
         }
     ]
 }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Table bucket policy to allows read access to the namespace
&lt;/h4&gt;

&lt;p&gt;This policy allows to read s3 tables from a namespace. Here Action Lists the specific actions allowed by the policy, related to S3 Tables. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;s3tables:GetTableData: Allows reading data from tables in the bucket.&lt;/li&gt;
&lt;li&gt;s3tables:GetTableMetadataLocation: Allows retrieving metadata location information for a table.
The resource section allows all s3 tables under bucket amzn-s3-demo-table-bucket1 but then s3tables:namespace restrict to only hr related s3 tables.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
     "Version": "2012-10-17",
     "Statement": [ 
         {
             "Effect": "Allow",
             "Action": [
             "Principal": {
               "AWS": "arn:aws:iam::123456789012:user/Jane"
             },
             "Action": [
                  "s3tables:GetTableData", 
                  "s3tables:GetTableMetadataLocation"
             ],
             "Resource":{ "arn:aws:s3tables:region:account_id:bucket/amzn-s3-demo-table-bucket1/table/*”}
             "Condition": { 
                  "StringLike": { "s3tables:namespace": "hr" } 
             }
     ]
 }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  S3 table automatic maintenance
&lt;/h2&gt;

&lt;p&gt;It provides automated maintenance through configurations that help simplify table management, optimize performance, and reduce operational overhead.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Table Lifecycle Management

&lt;ul&gt;
&lt;li&gt;we can add S3 Table configurations that includes lifecycle policies that automatically handle data expiration, transitions, or archival.&lt;/li&gt;
&lt;li&gt;automatic snapshot expiration can be configured easily.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Data Compaction

&lt;ul&gt;
&lt;li&gt;S3 Tables automatically compact small files (often produced by incremental writes) into larger, optimized files. It helps to have faster query and reduce storage cost.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Schema Evolution

&lt;ul&gt;
&lt;li&gt;Automated checks ensure compatibility between new and existing data.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Metadata Optimization

&lt;ul&gt;
&lt;li&gt;Indexing of metadata for faster querying and retrieval of table details.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All these can be policy based configuration.&lt;/p&gt;
&lt;h3&gt;
  
  
  Policy for snapshot management
&lt;/h3&gt;

&lt;p&gt;By configuring the maximumSnapshotAge, we can specify the retention period for table snapshots. The following example ensures S3 Table will automatically retain only the snapshots from the last 30 days&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MinimumSnapshots: Ensures that at least one snapshot is always retained, regardless of age. &lt;/li&gt;
&lt;li&gt;MaximumSnapshotAge: Specifies the maximum age (in hours) for snapshots to be retained.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws s3tables put-table-maintenance-configuration \
    --table-arn arn:aws:s3tables:region:account_id:bucket/bucket_name/table/table_name \
    --maintenance-configuration '{
        "SnapshotManagement": {
            "MinimumSnapshots": 1,
            "MaximumSnapshotAge": 720
        }
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  S3 Table Integration with AWS Analytics
&lt;/h2&gt;

&lt;p&gt;S3 Tables integrate seamlessly with AWS analytics services to enable querying, processing and insight generation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Athena - Run serverless SQL queries on S3 Tables&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use AWS Glue to create a Data Catalog for S3 Tables.&lt;/li&gt;
&lt;li&gt;Query data directly using SQL in Athena.&lt;/li&gt;
&lt;li&gt;Leverage table formats like Apache Iceberg or Parquet for optimized performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS Glue - Automate ETL processes for S3 Tables&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use Glue Crawlers to discover table metadata.&lt;/li&gt;
&lt;li&gt;Create ETL jobs to transform and load data into S3 Tables or other destinations.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  S3 Metadata table
&lt;/h2&gt;

&lt;p&gt;It includes system metadata including object tags and user defined metadata&lt;br&gt;
stored into s3 table&lt;br&gt;
generated in near real time during data creation so that it can be used in mins during query&lt;/p&gt;
&lt;h3&gt;
  
  
  Use case for S3 metadata table
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Real-Time Analytics

&lt;ul&gt;
&lt;li&gt;efficient query execution on metadata to identify relevant data partitions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Machine Learning Pipelines

&lt;ul&gt;
&lt;li&gt;metadata tables to filter, select, and partition data for model training.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Governance and Compliance

&lt;ul&gt;
&lt;li&gt;Track data retention and enforce lifecycle policies via metadata.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Multi-Tenant Data Applications

&lt;ul&gt;
&lt;li&gt;Use namespaces within metadata tables to logically isolate tenant data.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Data Cataloging and Discovery

&lt;ul&gt;
&lt;li&gt;Use metadata queries to identify datasets matching specific criteria.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is the sample python based function which uses metadata table query from athena.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def query_metadata_table(criteria):

    query = f"""
        SELECT *
        FROM {DATABASE}.{TABLE}
        WHERE {criteria}
    """

    print(f"Running query: {query}")

    # Start Athena query
    response = athena_client.start_query_execution(
        QueryString=query,
        QueryExecutionContext={'Database': DATABASE},
        ResultConfiguration={'OutputLocation': S3_OUTPUT}
    )

    query_execution_id = response['QueryExecutionId']

    # Wait for query completion
    print("Waiting for query to complete...")
    while True:
        status = athena_client.get_query_execution(QueryExecutionId=query_execution_id)
        state = status['QueryExecution']['Status']['State']
        if state in ['SUCCEEDED', 'FAILED', 'CANCELLED']:
            break
        time.sleep(2)

    if state != 'SUCCEEDED':
        raise Exception(f"Query failed with state: {state}")

    # Retrieve results
    results = athena_client.get_query_results(QueryExecutionId=query_execution_id)
    datasets = []
    for row in results['ResultSet']['Rows'][1:]:  # Skip the header row
        datasets.append([col['VarCharValue'] for col in row['Data']])

    print(f"Query returned {len(datasets)} datasets matching the criteria.")
    return datasets
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>aws</category>
      <category>s3</category>
      <category>analytics</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Brief Notes on AWS CodeDeploy</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Thu, 21 Mar 2024 19:04:04 +0000</pubDate>
      <link>https://dev.to/aws-builders/brief-notes-on-aws-codedeploy-2731</link>
      <guid>https://dev.to/aws-builders/brief-notes-on-aws-codedeploy-2731</guid>
      <description>&lt;p&gt;Service that automates code deployments to any instance, including Amazon EC2 instances and instances running on-premises.&lt;/p&gt;

&lt;h2&gt;
  
  
  Supported Platforms/Deployment Types:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;EC2/On-Premises: In-Place or Blue/Green Deployments&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Describes instances of physical servers that can be Amazon EC2 cloud instances, on-premises servers, or both. Applications created using the EC2/On-Premises compute platform can be composed of executable files, configuration files, images, and more. o   -   - Deployments that use the EC2/On-Premises compute platform manage the way in which traffic is directed to instances by using an in-place or blue/green deployment type.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;AWS Lambda: Canary, Linear, All-At-Once Deployments&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Applications created using the AWS Lambda compute platform can manage the way in which traffic is directed to the updated Lambda function versions during a deployment by choosing a canary, linear, or all-at-once configuration.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Amazon ECS: Blue/Green Deployment&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Used to deploy an Amazon ECS containerized application as a task set. &lt;/li&gt;
&lt;li&gt;CodeDeploy performs a blue/green deployment by installing an updated version of the containerized application as a new replacement task set. CodeDeploy reroutes production traffic from the original application, or task set, to the replacement task set. The original task set is terminated after a successful deployment.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Deployment approach for EC2
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Deploys a revision to a set of instances.&lt;/li&gt;
&lt;li&gt;Deploys a new revision that consists of an application and AppSpec file. The AppSpec specifies how to deploy the application to the instances in a deployment group.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezsau98rpjq1qlkvy69j.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezsau98rpjq1qlkvy69j.jpg" alt="URL" width="635" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment approach for Lambda
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Deploys a new version of a serverless Lambda function on a high-availability compute infrastructure.&lt;/li&gt;
&lt;li&gt;Shifts production traffic from one version of a Lambda function to a new version of the same function. The AppSpec file specifies which Lambda function version to deploy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gabfs5volddde900t0u.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gabfs5volddde900t0u.jpg" alt="url" width="660" height="290"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment approach for ECS
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Deploys an updated version of an Amazon ECS containerized application as a new, replacement task set. CodeDeploy reroutes production traffic from the task set with the original version to the new replacement task set with the updated version. When the deployment completes, the original task set is terminated.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fobk0qy9jw9jw03ddevli.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fobk0qy9jw9jw03ddevli.jpg" alt="URL" width="660" height="294"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  App Spec File
&lt;/h2&gt;

&lt;p&gt;The application specification file (AppSpec file) is a YAML-formatted or JSON-formatted file used by CodeDeploy to manage a deployment. Note: the name of the AppSpec file for an EC2/On-Premises deployment must be appspec.yml. The name of the AppSpec file for an Amazon ECS or AWS Lambda deployment must be appspec.yml.&lt;/p&gt;

&lt;p&gt;For ECS&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The container and port in replacement task set where your Application Load Balancer or Network Load Balancer reroutes traffic during a deployment. This is specified with the LoadBalancerInfo instruction in the AppSpec file.&lt;/li&gt;
&lt;li&gt;Amazon ECS task definition file. This is specified with its ARN in the TaskDefinition instruction in the AppSpec file.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For Lambda&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda function version to deploy.&lt;/li&gt;
&lt;li&gt;Lambda functions to use as validation tests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For EC2&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which lifecycle event hooks to run in response to deployment lifecycle events.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>cicd</category>
    </item>
    <item>
      <title>Bedrock Agent &amp; Tools - Tracing Best practises</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Wed, 20 Mar 2024 17:58:52 +0000</pubDate>
      <link>https://dev.to/aws-builders/bedrock-agent-tools-tracing-best-practises-4217</link>
      <guid>https://dev.to/aws-builders/bedrock-agent-tools-tracing-best-practises-4217</guid>
      <description>&lt;p&gt;I understand most of bedrock agent userss will have a use case where you have implemented multiple Lambda functions with a Bedrock Agent to perform different tasks and are looking for guidance in Debugging the API calls and responses from the Agent and lambda functions.&lt;/p&gt;

&lt;p&gt;Here are some of the approaches that we have been using and found quite effective to track and trace agents and usage of their tools&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enable Tracing for the Agent: When invoking the agent, set the &lt;code&gt;debug&lt;/code&gt; parameter to &lt;code&gt;true&lt;/code&gt;. This will enable detailed tracing for the agent's execution, including the tools (Lambda functions) invoked and their responses. The trace will be printed to the console or returned as part of the agent's response, depending on how you invoke the agent. [1] Example (Python): &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;python result = agent.run(query, debug=True)&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Log Within Lambda Functions: Within each of your Lambda functions (tools), add logging statements to capture relevant information and events. You can use AWS Lambda's built-in logging capabilities or integrate with a centralized logging service like Amazon CloudWatch Logs. [2] Example (Python): &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;python import logging &lt;br&gt;
logger = logging.getLogger(__name__) &lt;br&gt;
def lambda_handler(event, context): &lt;br&gt;
   http://logger.info (f"Received event: {event}") # Your Lambda function's logic here http://&lt;br&gt;
   logger.info (f"Returning result: {result}") return result&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Correlate Logs Using Request IDs or Tracing IDs: To correlate logs across multiple Lambda functions and the agent, you can use request IDs or tracing IDs. Pass a unique ID as part of the event or context to your Lambda functions and include it in your log statements. This will allow you to trace the flow of events across different components of your system.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;&lt;br&gt;
import logging&lt;br&gt;
   import uuid&lt;br&gt;
   def lambda_handler(event, context):&lt;br&gt;
       request_id = event.get("request_id", str(uuid.uuid4()))&lt;br&gt;
       logger = logging.getLogger(__name__)&lt;br&gt;
       logger = logging.LoggerAdapter(logger, {"request_id": request_id})&lt;br&gt;
       logger.info(f"Received event: {event}")&lt;br&gt;
       logger.info(f"Returning result: {result}")&lt;br&gt;
       return result&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Use AWS X-Ray for Distributed Tracing: AWS X-Ray is a service that can help you analyze and debug distributed applications, including Lambda functions. By integrating X-Ray with your Bedrock application, you can trace requests as they travel through your Lambda functions and gain insights into their performance and potential issues. [3] - Enable X-Ray tracing for your Lambda functions by adding the necessary configuration. - Instrument your Lambda functions with X-Ray tracing code to capture relevant information and events. - Use the X-Ray console or integrate with other monitoring tools to analyze the traces and identify potential bottlenecks or issues.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement Advanced prompts : By using advanced prompts, you can enhance your agent's accuracy through modifying these prompt templates to provide detailed configurations. You can also provide hand-curated examples for few-shot prompting, in which you improve model performance by providing labeled examples for a specific task. [4] By combining the built-in tracing mechanism, custom logging within your Lambda functions, and distributed tracing with AWS X-Ray, you can gain better visibility into the API calls, events, and interactions happening within your Bedrock agent and its associated tools. This can help you debug issues more effectively and trace errors back to their source across multiple Lambda functions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Reference&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/trace-events.html" rel="noopener noreferrer"&gt;Trace events in Amazon Bedrock - Amazon Bedrock&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/operatorguide/best-practices-debugging.html" rel="noopener noreferrer"&gt;Best practices for your debugging environment - AWS Lambda&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/xray/latest/devguide/aws-xray.html" rel="noopener noreferrer"&gt;What is AWS X-Ray? - AWS X-Ray &lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/advanced-prompts.html" rel="noopener noreferrer"&gt;Advanced prompts in Amazon Bedrock - Amazon Bedrock &lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

</description>
      <category>sagemaker</category>
      <category>aws</category>
      <category>bedrock</category>
    </item>
    <item>
      <title>AWS DEV OPS Professional Exam short notes</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Sun, 17 Mar 2024 05:55:58 +0000</pubDate>
      <link>https://dev.to/aws-builders/aws-dev-ops-professional-exam-short-notes-4b47</link>
      <guid>https://dev.to/aws-builders/aws-dev-ops-professional-exam-short-notes-4b47</guid>
      <description>&lt;p&gt;Last few weeks I have been preparing for this exam and have summarized below key notes for further quick reference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Notes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;You can use CloudWatch Logs to monitor applications and systems using log data. For example, CloudWatch Logs can track the number of errors that occur in your application logs and send you a notification whenever the rate of errors exceeds a threshold you specify. CloudWatch Logs uses your log data for monitoring; so, no code changes are required. For more information on Cloudwatch logs , please refer to the below link: &lt;a href="http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html&lt;/a&gt; The correct answer is: Install the CloudWatch Logs Agent on your AMI, and configure CloudWatch Logs Agent to stream your logs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can add another layer of protection by enabling MFA Delete on a versioned bucket. Once you do so, you must provide your AWS account’s access keys and a valid code from the account’s MFA device in order to permanently delete an object version or suspend or reactivate versioning on the bucket. For more information on MFA please refer to the below link: &lt;a href="https://aws.amazon.com/blogs/security/securing-access-to-aws-using-mfa-part-3/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/security/securing-access-to-aws-using-mfa-part-3/&lt;/a&gt; IAM roles are designed so that your applications can securely make API requests from your instances, without requiring you to manage the security credentials that the applications use. Instead of creating and distributing your AWS credentials, you can delegate permission to make API requests using IAM roles For more information on Roles for EC2 please refer to the below link: &lt;a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;As your infrastructure grows, common patterns can emerge in which you declare the same components in each of your templates. You can separate out these common components and create dedicated templates for them. That way, you can mix and match different templates but use nested stacks to create a single, unified stack. Nested stacks are stacks that create other stacks. To create nested stacks, use the AWS::CloudFormation::Stackresource in your template to reference other templates. For more information on best practices for Cloudformation please refer to the below link: &lt;a href="http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/best-practices.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/best-practices.html&lt;/a&gt; The correct answer is: Separate the AWS CloudFormation template into a nested structure that has individual templates for the resources that are to be governed by different departments, and use the outputs from the networking and security stacks for the application template that you control.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can use Amazon CloudWatch Logs to monitor, store, and access your log files from Amazon Elastic Compute Cloud (Amazon EC2) instances, AWS CloudTrail, and other sources. You can then retrieve the associated log data from CloudWatch Logs. For more information on Cloudwatch logs please refer to the below link: &lt;a href="http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html&lt;/a&gt; You can the use Kinesis to process those logs For more information on Amazon Kinesis please refer to the below link: &lt;a href="http://docs.aws.amazon.com/streams/latest/dev/introduction.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/streams/latest/dev/introduction.html&lt;/a&gt; The correct answers are: Using AWS CloudFormation, create a CloudWatch Logs LogGroup and send the operating system and application logs of interest using the CloudWatch Logs Agent., Using configuration management, set up remote logging to send events to Amazon Kinesis and insert these into Amazon CloudSearch or Amazon Redshift, depending on available analytic tools.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;IAM roles are designed so that your applications can securely make API requests from your instances, without requiring you to manage the security credentials that the applications use. Instead of creating and distributing your AWS credentials For more information on IAM Roles please refer to the below link: &lt;a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The AWS Security Token Service (STS) is a web service that enables you to request temporary, limited-privilege credentials for AWS Identity and Access Management (IAM) users or for users that you authenticate (federated users). The token can then be used to grant access to the objects in S3. You can then provides access to the objects based on the key values generated via the user id&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;As your infrastructure grows, common patterns can emerge in which you declare the same components in each of your templates. You can separate out these common components and create dedicated templates for them. That way, you can mix and match different templates but use nested stacks to create a single, unified stack. Nested stacks are stacks that create other stacks. To create nested stacks, use the AWS::CloudFormation::Stackresource in your template to reference other templates. For more information on Cloudformation best practises please refer to the below link: &lt;a href="http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/best-practices.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/best-practices.html&lt;/a&gt; The correct answer is: Create separate templates based on functionality, create nested stacks with CloudFormation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The default autosclae termination policy is designed to help ensure that your network architecture spans Availability Zones evenly. When using the default termination policy, Auto Scaling selects an instance to terminate as follows: Auto Scaling determines whether there are instances in multiple Availability Zones. If so, it selects the Availability Zone with the most instances and at least one instance that is not protected from scale in. If there is more than one Availability Zone with this number of instances, Auto Scaling selects the Availability Zone with the instances that use the oldest launch configuration. For more information on Autoscaling instance termination please refer to the below link: &lt;a href="http://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-termination.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-termination.html&lt;/a&gt; The correct answer is: Auto Scaling will select the AZ with 4 EC2 instances and terminate an instance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Amazon RDS Read Replicas provide enhanced performance and durability for database (DB) instances. This replication feature makes it easy to elastically scale out beyond the capacity constraints of a single DB Instance for read-heavy database workloads. You can create one or more replicas of a given source DB Instance and serve high-volume application read traffic from multiple copies of your data, thereby increasing aggregate read throughput. Sharding is a common concept to split data across multiple tables in a database. Shard your data set among multiple Amazon RDS DB instances.Amazon ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud. The service improves the performance of web applications by allowing you to retrieve information from fast, managed, in-memory data stores, instead of relying entirely on slower disk-based databases.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build, allowing teams to detect problems early.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Elastic Beanstalk simplifies this process by managing the Amazon SQS queue and running a daemon process on each instance that reads from the queue for you. When the daemon pulls an item from the queue, it sends an HTTP POST request locally to &lt;a href="http://localhost/" rel="noopener noreferrer"&gt;http://localhost/&lt;/a&gt; with the contents of the queue message in the body. All that your application needs to do is perform the long-running task in response to the POST. For more information Elastic Beanstalk managing worker environments, please visit the below URL: &lt;a href="http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If you suspend AddToLoadBalancer, Auto Scaling launches the instances but does not add them to the load balancer or target group. If you resume the AddToLoadBalancer process, Auto Scaling resumes adding instances to the load balancer or target group when they are launched. However, Auto Scaling does not add the instances that were launched while this process was suspended. You must register those instances manually. For more information on the Suspension and Resumption process, please visit the below URL: &lt;a href="http://docs.aws.amazon.com/autoscaling/latest/userguide/as-suspend-resume-processes.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/autoscaling/latest/userguide/as-suspend-resume-processes.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can use the container_commands key of elastic beanstalk to execute commands that affect your application source code. Container commands run after the application and web server have been set up and the application version archive has been extracted, but before the application version is deployed. Non-container commands and other customization operations are performed prior to the application source code being extracted. You can use leader_only to only run the command on a single instance, or configure a test to only run the command when a test command evaluates to true. Leader-only container commands are only executed during environment creation and deployments, while other commands and server customization operations are performed every time an instance is provisioned or updated. Leader-only container commands are not executed due to launch configuration changes, such as a change in the AMI Id or instance type. For more information on customizing containers, please visit the below URL: &lt;a href="http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html&lt;/a&gt; The correct answer is: Use a “Container command” within an Elastic Beanstalk configuration file to execute the script, ensuring that the “leader only” flag is set to true.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A Dockerrun.aws.json file is an Elastic Beanstalk–specific JSON file that describes how to deploy a set of Docker containers as an Elastic Beanstalk application. You can use aDockerrun.aws.json file for a multicontainer Docker environment. Dockerrun.aws.json describes the containers to deploy to each container instance in the environment as well as the data volumes to create on the host instance for the containers to mount. &lt;a href="http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_docker_v2config.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_docker_v2config.html&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Elastic Beanstalk supports the deployment of web applications from Docker containers. With Docker containers, you can define your own runtime environment. You can choose your own platform, programming language, and any application dependencies (such as package managers or tools), that aren’t supported by other platforms. Docker containers are self-contained and include all the configuration information and software your web application requires to run.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When you see Amazon Kinesis as an option, this becomes the ideal option to process data in real time. Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. Amazon Kinesis offers key capabilities to cost effectively process streaming data at any scale, along with the flexibility to choose the tools that best suit the requirements of your application. With Amazon Kinesis, you can ingest real-time data such as application logs, website clickstreams, IoT telemetry data, and more into your databases, data lakes and data warehouses, or build your own real-time applications using this data. For more information on Amazon Kinesis, please visit the below URL: &lt;a href="https://aws.amazon.com/kinesis" rel="noopener noreferrer"&gt;https://aws.amazon.com/kinesis&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can use CloudWatch Logs to monitor applications and systems using log data CloudWatch Logs uses your log data for monitoring; so, no code changes are required. For example, you can monitor application logs for specific literal terms (such as “NullReferenceException”) or count the number of occurrences of a literal term at a particular position in log data (such as “404” status codes in an Apache access log). When the term you are searching for is found, CloudWatch Logs reports the data to a CloudWatch metric that you specify. Log data is encrypted while in transit and while it is at rest For more information on Cloudwatch logs please refer to the below link: &lt;a href="http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html&lt;/a&gt; Amazon CloudWatch uses Amazon SNS to send email. First, create and subscribe to an SNS topic. When you create a CloudWatch alarm, you can add this SNS topic to send an email notification when the alarm changes state. For more information on SNS and Cloudwatch logs please refer to the below link:  The correct answers are: Install a CloudWatch Logs Agent on your servers to stream web application logs to CloudWatch., Create a CloudWatch Logs group and define metric filters that capture 500 Internal Server Errors. Set a CloudWatch alarm on that metric., Use Amazon Simple Notification Service to notify an on-call engineer when a CloudWatch alarm is triggered&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When you provision an Amazon EC2 instance in an AWS CloudFormation stack, you might specify additional actions to configure the instance, such as install software packages or bootstrap applications. Normally, CloudFormation proceeds with stack creation after the instance has been successfully created. However, you can use a CreationPolicy so that CloudFormation proceeds with stack creation only after your configuration actions are done. That way you’ll know your applications are ready to go after stack creation succeeds.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Auto Scaling periodically performs health checks on the instances in your Auto Scaling group and identifies any instances that are unhealthy. You can configure Auto Scaling to determine the health status of an instance using Amazon EC2 status checks, Elastic Load Balancing health checks, or custom health checks By default, Auto Scaling health checks use the results of the EC2 status checks to determine the health status of an instance. Auto Scaling marks an instance as unhealthy if its instance fails one or more of the status checks. For more information monitoring in Autoscaling , please visit the below URL: &lt;a href="http://docs.aws.amazon.com/autoscaling/latest/userguide/as-monitoring-features.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/autoscaling/latest/userguide/as-monitoring-features.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You need to have a custom health check which will evaluate the application functionality. Its not enough using the normal health checks. If the application functionality does not work and if you don’t have custom health checks , the instances will still be deemed as healthy. If you have custom health checks, you can send the information from your health checks to Auto Scaling so that Auto Scaling can use this information. For example, if you determine that an instance is not functioning as expected, you can set the health status of the instance to Unhealthy. The next time that Auto Scaling performs a health check on the instance, it will determine that the instance is unhealthy and then launch a replacement instance For more information on Autoscaling health checks , please refer to the below document link: from AWS &lt;a href="http://docs.aws.amazon.com/autoscaling/latest/userguide/healthcheck.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/autoscaling/latest/userguide/healthcheck.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A blue group carries the production load while a green group is staged and deployed with the new code. When it’s time to deploy, you simply attach the green group to the existing load balancer to introduce traffic to the new environment. For HTTP/HTTPS listeners, the load balancer favors the green Auto Scaling group because it uses a least outstanding requests routing algorithm As you scale up the green Auto Scaling group, you can take blue Auto Scaling group instances out of service by either terminating them or putting them in Standby state, For more information on Blue Green Deployments , please refer to the below document link: from AWS &lt;a href="https://d0.awsstatic.com/whitepapers/AWS_Blue_Green_Deployments.pdf" rel="noopener noreferrer"&gt;https://d0.awsstatic.com/whitepapers/AWS_Blue_Green_Deployments.pdf&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ensure first that the cloudformation template is updated with the new instance type. The AWS::AutoScaling::AutoScalingGroup resource supports an UpdatePolicy attribute. This is used to define how an Auto Scaling group resource is updated when an update to the CloudFormation stack occurs. A common approach to updating an Auto Scaling group is to perform a rolling update, which is done by specifying the AutoScalingRollingUpdate policy. This retains the same Auto Scaling group and replaces old instances with new ones, according to the parameters specified&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;With web identity federation, you don’t need to create custom sign-in code or manage your own user identities. Instead, users of your app can sign in using a well-known identity provider (IdP) —such as Login with Amazon, Facebook, Google, or any other OpenID Connect (OIDC)-compatible IdP, receive an authentication token, and then exchange that token for temporary security credentials in AWS that map to an IAM role with permissions to use the resources in your AWS account. Using an IdP helps you keep your AWS account secure, because you don’t have to embed and distribute long-term security credentials with your application.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The optional Conditions section includes statements that define when a resource is created or when a property is defined. For example, you can compare whether a value is equal to another value. Based on the result of that condition, you can conditionally create resources. If you have multiple conditions, separate them with commas. You might use conditions when you want to reuse a template that can create resources in different contexts, such as a test environment versus a production environment. In your template, you can add an EnvironmentType input parameter, which accepts either prod or test as inputs. For the production environment, you might include Amazon EC2 instances with certain capabilities; however, for the test environment, you want to use reduced capabilities to save money. With conditions, you can define which resources are created and how they’re configured for each environment type. For more information on Cloudformation conditions please refer to the below link: &lt;a href="http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/conditions-section-structure.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/conditions-section-structure.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Elastic Beanstalk already has the facility to manage various versions and you don’t need to use S3 separately for this.AWS beanstalk is the perfect solution for developers to maintain application versions. With AWS Elastic Beanstalk, you can quickly deploy and manage applications in the AWS Cloud without worrying about the infrastructure that runs those applications. AWS Elastic Beanstalk reduces management complexity without restricting choice or control. You simply upload your application, and AWS Elastic Beanstalk automatically handles the details of capacity provisioning, load balancing, scaling, and application health monitoring.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The first step in using Elastic Beanstalk is to create an application, which represents your web application in AWS. In Elastic Beanstalk an application serves as a container for the environments that run your web app, and versions of your web app’s source code, saved configurations, logs and other artifacts that you create while using Elastic Beanstalk. For more information on Applications, please refer to the below link: &lt;a href="http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/applications.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/applications.html&lt;/a&gt; Deploying a new version of your application to an environment is typically a fairly quick process. The new source bundle is deployed to an instance and extracted, and then the web container or application server picks up the new version and restarts if necessary. During deployment, your application might still become unavailable to users for a few seconds. You can prevent this by configuring your environment to use rolling deployments to deploy the new version to instances in batches. For more information on deployment, please refer to the below link: &lt;a href="http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.deploy-existing-version.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.deploy-existing-version.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Weighted routing lets you associate multiple resources with a single domain name (example.com) or subdomain name (acme.example.com) and choose how much traffic is routed to each resource. This can be useful for a variety of purposes, including load balancing and testing new versions of software. For more information on the Routing policy please refer to the below link: &lt;a href="http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Amazon Elasticsearch Service makes it easy to deploy, operate, and scale Elasticsearch for log analytics, full text search, application monitoring, and more. Amazon Elasticsearch Service is a fully managed service that delivers Elasticsearch’s easy-to-use APIs and real-time capabilities along with the availability, scalability, and security required by production workloads. The service offers built-in integrations with Kibana, Logstash, and AWS services including Amazon Kinesis Firehose, AWS Lambda, and Amazon CloudWatch so that you can go from raw data to actionable insights quickly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can use CloudWatch Logs to monitor applications and systems using log data. For example, CloudWatch Logs can track the number of errors that occur in your application logs and send you a notification whenever the rate of errors exceeds a threshold you specify. CloudWatch Logs uses your log data for monitoring; so, no code changes are required. For example, you can monitor application logs for specific literal terms (such as “NullReferenceException”) or count the number of occurrences of a literal term at a particular position in log data (such as “404” status codes in an Apache access log). When the term you are searching for is found, CloudWatch Logs reports the data to a CloudWatch metric that you specify. For more information on Cloudwatch Logs please refer to the below link: &lt;a href="http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html&lt;/a&gt; Amazon CloudWatch uses Amazon SNS to send email. First, create and subscribe to an SNS topic. When you create a CloudWatch alarm, you can add this SNS topic to send an email notification when the alarm changes state. For more information on Cloudwatch and SNS please refer to the below link: &lt;a href="http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/US_SetupSNS.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/US_SetupSNS.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AWS OpsWorks is a configuration management service that uses Chef, an automation platform that treats server configurations as code. OpsWorks uses Chef to automate how servers are configured, deployed, and managed across your Amazon Elastic Compute Cloud (Amazon EC2) instances or on-premises compute environments. OpsWorks has two offerings, AWS Opsworks for Chef Automate, and AWS OpsWorks Stacks. For more information on Opswork and SNS please refer to the below link: &lt;a href="https://aws.amazon.com/opsworks/" rel="noopener noreferrer"&gt;https://aws.amazon.com/opsworks/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can use Kinesis Streams for rapid and continuous data intake and aggregation. The type of data used includes IT infrastructure log data, application logs, social media, market data feeds, and web clickstream data. Because the response time for the data intake and processing is in real time, the processing is typically lightweight. The following are typical scenarios for using Kinesis Streams: Accelerated log and data feed intake and processing – You can have producers push data directly into a stream. For example, push system and application logs and they’ll be available for processing in seconds. This prevents the log data from being lost if the front end or application server fails. Kinesis Streams provides accelerated data feed intake because you don’t batch the data on the servers before you submit it for intake. Real-time metrics and reporting – You can use data collected into Kinesis Streams for simple data analysis and reporting in real time. For example, your data-processing application can work on metrics and reporting for system and application logs as the data is streaming in, rather than wait to receive batches of data. For more information on Amazon Kinesis and SNS please refer to the below link: &lt;a href="http://docs.aws.amazon.com/streams/latest/dev/introduction.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/streams/latest/dev/introduction.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;With Elastic Beanstalk, you can quickly deploy and manage applications in the AWS Cloud without worrying about the infrastructure that runs those applications. AWS Elastic Beanstalk reduces management complexity without restricting choice or control. You simply upload your application, and Elastic Beanstalk automatically handles the details of capacity provisioning, load balancing, scaling, and application health monitoring For more information on Elastic beanstalk please refer to the below link: &lt;a href="http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/Welcome.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/Welcome.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can use intrinsic functions, such as Fn::If, Fn::Equals, and Fn::Not, to conditionally create stack resources. These conditions are evaluated based on input parameters that you declare when you create or update a stack. After you define all your conditions, you can associate them with resources or resource properties in the Resources and Outputs sections of a template.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Amazon RDS Multi-AZ deployments provide enhanced availability and durability for Database (DB) Instances, making them a natural fit for production database workloads. When you provision a Multi-AZ DB Instance, Amazon RDS automatically creates a primary DB Instance and synchronously replicates the data to a standby instance in a different Availability Zone (AZ). Each AZ runs on its own physically distinct, independent infrastructure, and is engineered to be highly reliable. In case of an infrastructure failure, Amazon RDS performs an automatic failover to the standby (or to a read replica in the case of Amazon Aurora), so that you can resume database operations as soon&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can use AWS CloudTrail to get a history of AWS API calls and related events for your account. This history includes calls made with the AWS Management Console, AWS Command Line Interface, AWS SDKs, and other AWS services. For more information on Cloudtrail, please visit the below URL: &lt;a href="http://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html&lt;/a&gt; Amazon CloudWatch Events delivers a near real-time stream of system events that describe changes in Amazon Web Services (AWS) resources. Using simple rules that you can quickly set up, you can match events and route them to one or more target functions or streams. CloudWatch Events becomes aware of operational changes as they occur. CloudWatch Events responds to these operational changes and takes corrective action as necessary, by sending messages to respond to the environment, activating functions, making changes, and capturing state information&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;By default, all AWS accounts are limited to 5 Elastic IP addresses per region, because public (IPv4) Internet addresses are a scarce public resource. We strongly encourage you to use an Elastic IP address primarily for the ability to remap the address to another instance in the case of instance failure, and to use DNS hostnames for all other inter-node communication&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can manage Amazon SQS messages with Amazon S3. This is especially useful for storing and consuming messages with a message size of up to 2 GB. To manage Amazon SQS messages with Amazon S3, use the Amazon SQS Extended Client Library for Java. Specifically, you use this library to: Specify whether messages are always stored in Amazon S3 or only when a message’s size exceeds 256 KB. Send a message that references a single message object stored in an Amazon S3 bucket. Get the corresponding message object from an Amazon S3 bucket. Delete the corresponding message object from an Amazon S3 bucket. For more information on processing large messages for SQS, please visit the below URL: &lt;a href="http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-s3-messages.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-s3-messages.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AWS CloudFormation provisions and configures resources by making calls to the AWS services that are described in your template. After all the resources have been created, AWS CloudFormation reports that your stack has been created. You can then start using the resources in your stack. If stack creation fails, AWS CloudFormation rolls back your changes by deleting the resources that it created. The below snapshot from Cloudformation shows what happens when there is an error in the stack creation. For more information on how CloudFormation works , please refer to the below link: &lt;a href="http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-whatis-howdoesitwork.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-whatis-howdoesitwork.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Because Elastic Beanstalk performs an in-place update when you update your application versions, your application may become unavailable to users for a short period of time. It is possible to avoid this downtime by performing a blue/green deployment, where you deploy the new version to a separate environment, and then swap CNAMEs of the two environments to redirect traffic to the new version instantly. Blue/green deployments require that your environment runs independently of your production database, if your application uses one. If your environment has an Amazon RDS DB instance attached to it, the data will not transfer over to your second environment, and will be lost if you terminate the original environment. For more information on Blue Green deployments with Elastic beanstalk , please refer to the below link: &lt;a href="http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.CNAMESwap.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.CNAMESwap.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Amazon RDS Read Replicas provide enhanced performance and durability for database (DB) instances. This replication feature makes it easy to elastically scale out beyond the capacity constraints of a single DB Instance for read-heavy database workloads. You can create one or more replicas of a given source DB Instance and serve high-volume application read traffic from multiple copies of your data, thereby increasing aggregate read throughput. Read replicas can also be promoted when needed to become standalone DB instances.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Amazon Route 53 health checks monitor the health and performance of your web applications, web servers, and other resources.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If you use SSL termination, your servers will always get non-secure connections and will never know whether users used a more secure channel or not. If you are using Elastic beanstalk to configure the ELB, you can use the below article to ensure end to end encryption. &lt;a href="http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/configuring-https-endtoend.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/configuring-https-endtoend.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. It can capture, transform, and load streaming data into Amazon Kinesis Analytics, Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security. For more information on Kinesis firehose, please visit the below URL: &lt;a href="https://aws.amazon.com/kinesis/firehose/" rel="noopener noreferrer"&gt;https://aws.amazon.com/kinesis/firehose/&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use Cloudfront distribution for distributing the heavy reads for your application. You can create a zone apex record to point to the Cloudfront distribution. You can control how long your objects stay in a CloudFront cache before CloudFront forwards another request to your origin. Reducing the duration allows you to serve dynamic content. Increasing the duration means your users get better performance because your objects are more likely to be served directly from the edge cache. A longer duration also reduces the load on your origin.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Amazon EBS encryption offers you a simple encryption solution for your EBS volumes without the need for you to build, maintain, and secure your own key management infrastructure. When you create an encrypted EBS volume and attach it to a supported instance type, the following types of data are encrypted: Data at rest inside the volume All data moving between the volume and the instance All snapshots created from the volume Snapshots that are taken from encrypted volumes are automatically encrypted. Volumes that are created from encrypted snapshots are also automatically encrypted.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A tag is a label that you or AWS assigns to an AWS resource. Each tag consists of a key and a value. A key can have more than one value. You can use tags to organize your resources, and cost allocation tags to track your AWS costs on a detailed level. After you activate cost allocation tags, AWS uses the cost allocation tags to organize your resource costs on your cost allocation report, to make it easier for you to categorize and track your AWS costs. AWS provides two types of cost allocation tags, an AWS-generated tag and user-defined tags. AWS defines, creates, and applies the AWS-generated tag for you, and you define, create, and apply user-defined tags. You must activate both types of tags separately before they can appear in Cost Explorer or on a cost allocation report.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You can monitor the progress of a stack update by viewing the stack’s events. The console’s Events tab displays each major step in the creation and update of the stack sorted by the time of each event with latest events on top. The start of the stack update process is marked with an UPDATE_IN_PROGRESS event for the stack For more information on Monitoring your stack, please visit the below URL: &lt;a href="http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-monitor-stack.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-monitor-stack.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A placement group is a logical grouping of instances within a single Availability Zone. Placement groups are recommended for applications that benefit from low network latency, high network throughput, or both. To provide the lowest latency, and the highest packet-per-second network performance for your placement group, choose an instance type that supports enhanced networking. For more information on Placement Groups, please visit the below URL: &lt;a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AWS CloudTrail is an AWS service that helps you enable governance, compliance, and operational and risk auditing of your AWS account. Actions taken by a user, role, or an AWS service are recorded as events in CloudTrail. Events include actions taken in the AWS Management Console, AWS Command Line Interface, and AWS SDKs and APIs. Visibility into your AWS account activity is a key aspect of security and operational best practices. You can use CloudTrail to view, search, download, archive, analyze, and respond to account activity across your AWS infrastructure. You can identify who or what took which action, what resources were acted upon, when the event occurred, and other details to help you analyze and respond to activity in your AWS account. For more information on Cloudtrail, please visit the below URL: &lt;a href="http://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Custom resources enable you to write custom provisioning logic in templates that AWS CloudFormation runs anytime you create, update (if you changed the custom resource), or delete stacks. For example, you might want to include resources that aren’t available as AWS CloudFormation resource types. You can include those resources by using custom resources. That way you can still manage all your related resources in a single stack. Use the AWS::CloudFormation::CustomResource or Custom::String resource type to define custom resources in your templates. Custom resources require one property: the service token, which specifies where AWS CloudFormation sends requests to, such as an Amazon SNS topic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Failover routing lets you route traffic to a resource when the resource is healthy or to a different resource when the first resource is unhealthy. The primary and secondary resource record sets can route traffic to anything from an Amazon S3 bucket that is configured as a website to a complex tree of records. For more information on Route53 Failover Routing, please visit the below URL: &lt;a href="http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html" rel="noopener noreferrer"&gt;http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Deployment Types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single Target Deployment - small dev projects, legacy or non-HA infrastructure; outage occurs in case of failure, testing opportunity is limited.&lt;/li&gt;
&lt;li&gt;All-at-Once Deployment - deployment happens on multiple targets, requires Orchestration tools, suitable for non critical apps in 5-10 range.&lt;/li&gt;
&lt;li&gt;Minimum in-service Deployment - keeps min in-service targets and deploy in multiple stages, suitable for large environments, allow automated testing, no downtime&lt;/li&gt;
&lt;li&gt;Rolling Deployments - x targets per stage, happens in multiple stages, after completion of stage 1, next stage begins, orchestration and health check required, can be least efficient if x is smaller, allow automated testing, no downtime if x is not large to impact application, can be paused, allowing multi-version testing.&lt;/li&gt;
&lt;li&gt;Blue Green Deployment - Deploy to seperate Green environment, update the code on Green, extra cost due to duplicate env during deployment, Deployment is rapid, cutover and migration is clean(DNS Change), Rollback easy(DNS regression), can be fully automates using CFN etc. Binary, No Traffic Split, not used to feature test&lt;/li&gt;
&lt;li&gt;A/B Testing - distribution traffic between blue/green, allows gradual performance/stability/health analysis, allows new feature testing, rollback is quick, end goal of A/B testing is not migration, Uses Route 53 for DNS resolution, 2 records one pointing A, other pointing B, weighted/round robin.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Intrinsic &amp;amp; Conditional Functions&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Intrinsic Fn - inbuilt function provided by AWS to help manage, reference, and condtionally act upon resources, situation &amp;amp; inputs to a stack. &lt;/li&gt;
&lt;li&gt;Fn::Base64 - Base64 encoding for User Data&lt;/li&gt;
&lt;li&gt;Fn::FindInMap - Mapping lookup &lt;/li&gt;
&lt;li&gt;Fn::GetAtt - Advanced reference look up &lt;/li&gt;
&lt;li&gt;Fn::GetAZs - retrieve list of AZs in a region &lt;/li&gt;
&lt;li&gt;Fn::Join - construct complex strings; concatenate strings &lt;/li&gt;
&lt;li&gt;Fn::Select - value selection from list (0, 1) &lt;/li&gt;
&lt;li&gt;Ref - default value of resource &lt;/li&gt;
&lt;li&gt;Conditional Functions - Fn::And, Fn::Equals, Fn::If, Fn::Not, Fn::Or&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;CFN Resource Deletion Policies&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A policy/setting which is associated with each resource in a template; A way to control what happens to each resource when a stack is deleted.&lt;/li&gt;
&lt;li&gt;Policy value - Delete (Default), Retain, Snapshot&lt;/li&gt;
&lt;li&gt;Delete - Useful for testing environment, CI/CD/QA workflows, &lt;/li&gt;
&lt;li&gt;Presales, Short Lifecycle/Immutable env.&lt;/li&gt;
&lt;li&gt;Retain - live beyond lifcycle of stack; Windows Server Platform (AD), Servers with state, SQL, Exchange, File Servers, &lt;/li&gt;
&lt;li&gt;Non immutable architectures.&lt;/li&gt;
&lt;li&gt;Snapshot - restricted policy type only available for EBS volumes; takes snapshot before deleting for recovering data.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Immutable Architecture - Replace infra instead of upgrading or repairing faulty components, treat servers as unchangeable objects, don't diagnose and fix, throw away and re-create, Nothing bootstraped except AMI.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;CFN Stack updates&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stack policy is checked, updates can be prevented; absence of &lt;/li&gt;
&lt;li&gt;stack policy allow all updates; stack policy cannot be deleted once applied. Once stack policy applied ALL objects are protected, Update is denied; to remove default DENY, explicit allow is required; can be applied to a single resource(id)/Wild card/NotResource; Has Principal and Action; Condition element (resource type) can also be used.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Stack updates: 4 Types - Update with No Interrupion, Some Interruption, Replacement, Delete&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Sagemaker Model deployment and Integration</title>
      <dc:creator>Amit Kayal</dc:creator>
      <pubDate>Wed, 18 Jan 2023 07:29:11 +0000</pubDate>
      <link>https://dev.to/aws-builders/sagemaker-model-deployment-and-integration-2l6c</link>
      <guid>https://dev.to/aws-builders/sagemaker-model-deployment-and-integration-2l6c</guid>
      <description>&lt;h1&gt;
  
  
  Sagemaker Model deployment and Integration
&lt;/h1&gt;

&lt;p&gt;[TOC]&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS Feature store
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/sagemaker/feature-store/" rel="noopener noreferrer"&gt;SageMaker Feature Store&lt;/a&gt; is a purpose-built solution for ML feature management. It helps data science teams reuse ML features across teams and models, serve features for model predictions at scale with low latency, and train and deploy new models more quickly and effectively.&lt;/p&gt;

&lt;p&gt;Refer the notebook &lt;a href="https://github.com/aws-samples/ml-lineage-helper/blob/main/examples/example.ipynb" rel="noopener noreferrer"&gt;https://github.com/aws-samples/ml-lineage-helper/blob/main/examples/example.ipynb&lt;/a&gt; for more details.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gpthlqmcqtvgoq5vloj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gpthlqmcqtvgoq5vloj.png" alt="im" width="696" height="375"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why is feature lineage important?
&lt;/h3&gt;

&lt;p&gt;Imagine trying to manually track all of this for a large team, multiple teams, or even multiple business units. Lineage tracking and querying helps make this more manageable and helps organizations move to ML at scale. The following are four examples of how feature lineage helps scale the ML process:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Build confidence for reuse of existing features&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Avoid reinventing features that are based on the same raw data as existing features&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Troubleshoot and audit models and model predictions&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Manage features proactively&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  AWS ML Lens and built-in models
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foe752fq1d6ubqp0vjx03.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foe752fq1d6ubqp0vjx03.PNG" alt="im" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F22dz1xo92o6rvnjlo7zl.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F22dz1xo92o6rvnjlo7zl.PNG" alt="im" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Deployment Options
&lt;/h1&gt;

&lt;p&gt;ML inference can be done in real time on individual records, such as with a REST API endpoint. Inference can also be done in batch mode as a processing job on a large dataset. While both approaches push data through a model, each has its own target goal when running inference at scale.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;* Real Time*&lt;/th&gt;
&lt;th&gt;&lt;em&gt;Micro Batch&lt;/em&gt;&lt;/th&gt;
&lt;th&gt;&lt;em&gt;Batch&lt;/em&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;*&lt;em&gt;Execution Mode *&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;Synchronous&lt;/td&gt;
&lt;td&gt;Synchronous/Asynchronous&lt;/td&gt;
&lt;td&gt;Asynchronous&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;*&lt;em&gt;Prediction Latency *&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;Subsecond&lt;/td&gt;
&lt;td&gt;Seconds to minutes&lt;/td&gt;
&lt;td&gt;Indefinite&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Bounds&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unbounded/stream&lt;/td&gt;
&lt;td&gt;Bounded&lt;/td&gt;
&lt;td&gt;Bounded&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;*&lt;em&gt;Execution Frequency *&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;Variable&lt;/td&gt;
&lt;td&gt;Variable&lt;/td&gt;
&lt;td&gt;Variable/fixed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;*&lt;em&gt;Invocation Mode *&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;Continuous stream/API calls&lt;/td&gt;
&lt;td&gt;Event-based&lt;/td&gt;
&lt;td&gt;Event-based/scheduled&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Examples&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Real-time REST API endpoint&lt;/td&gt;
&lt;td&gt;Data analyst running a SQL UDF&lt;/td&gt;
&lt;td&gt;Scheduled inference job&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h1&gt;
  
  
  Realtime deployment
&lt;/h1&gt;

&lt;p&gt;Sagemaker real-time deployment has the following approach. Key point here is that we can have our inference pipeline coupled with autoscale. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1enzoeiv9tfr3tzqxjm.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1enzoeiv9tfr3tzqxjm.PNG" alt="im" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here are different ways, we can deploy real-time endpoint by sagemaker.  You can see here multiple options from own model, own container to prebuilt container.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn7qu02wcm6yyjwwhzwmu.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn7qu02wcm6yyjwwhzwmu.PNG" alt="im" width="800" height="395"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With sagemaker, prebuilt container and its own inference script, we can use this as shared below. &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvqtgdqh809cnifspdi4.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvqtgdqh809cnifspdi4.PNG" alt="im" width="800" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Quite a lot of time, we add our own inference script and this is quite simple as shown below. &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftcbmd1e0636z7r3cs70w.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftcbmd1e0636z7r3cs70w.PNG" alt="im" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is not rare to have our own container and own trained model along with inference script. The architecture does not change for that and we still follow same architecture as shared below. &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqqeo555pymn5de7r0jpv.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqqeo555pymn5de7r0jpv.PNG" alt="im" width="800" height="398"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Autoscale
&lt;/h3&gt;

&lt;p&gt;we can set autoscale policy for sagemaker endpoint to scale up and scale down automatically.  &lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe9ppfuwozwag6o0hc2e2.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe9ppfuwozwag6o0hc2e2.PNG" alt="im" width="800" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have to set autoscale policy setup for endpoint. You can see here that ServiceNamespace is set to sgaemaker and resourceId is set to Endpoint name.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cito27qme7jkwk097ug.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cito27qme7jkwk097ug.PNG" alt="im" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi Modal endpoint
&lt;/h2&gt;

&lt;p&gt;SageMaker multi-model endpoints work with several frameworks, such as TensorFlow, PyTorch, MXNet, and sklearn, and you can &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/build-multi-model-build-container.html" rel="noopener noreferrer"&gt;build your own container with a multi-model server.&lt;/a&gt; Multi-model endpoints are also supported natively in the following popular SageMaker built-in algorithms: &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html" rel="noopener noreferrer"&gt;XGBoost&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html" rel="noopener noreferrer"&gt;Linear Learner&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/randomcutforest.html" rel="noopener noreferrer"&gt;Random Cut Forest&lt;/a&gt; (RCF), and &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/k-nearest-neighbors.html" rel="noopener noreferrer"&gt;K-Nearest Neighbors&lt;/a&gt; (KNN). &lt;/p&gt;

&lt;p&gt;Refer the notebook &lt;a href="https://github.com/aws-samples/sagemaker-multi-model-endpoint-tensorflow-computer-vision/blob/main/multi-model-endpoint-tensorflow-cv.ipynb" rel="noopener noreferrer"&gt;https://github.com/aws-samples/sagemaker-multi-model-endpoint-tensorflow-computer-vision/blob/main/multi-model-endpoint-tensorflow-cv.ipynb&lt;/a&gt; to understand how we can deploy this/. Refer the blog &lt;a href="https://aws.amazon.com/blogs/machine-learning/save-on-inference-costs-by-using-amazon-sagemaker-multi-model-endpoints/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/machine-learning/save-on-inference-costs-by-using-amazon-sagemaker-multi-model-endpoints/&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;All of the models that are hosted on a multi-modal endpoint must share the same serving container image. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multi-model endpoints are an option that can improve endpoint utilization when your models are of similar size and share the same container image  and have similar invocation latency requirements.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;all the model needs to share same S3 bucket to host their weights&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe0zo39qdu5vgu15dvzi7.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe0zo39qdu5vgu15dvzi7.jpg" alt="im" width="800" height="552"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sf346dtlj3r28w14wh5.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sf346dtlj3r28w14wh5.gif" alt="im" width="800" height="257"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost advantages
&lt;/h3&gt;

&lt;p&gt;This diagram demonstrates running 10 models on a multi-model endpoint versus using 10 separate endpoints. This results in savings of $3,000 per month, as shown in the following figure: Multi-model endpoints can easily scale to hundreds or thousands of models. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0ho24l9vl6vcvkyqv6e.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0ho24l9vl6vcvkyqv6e.gif" alt="img" width="800" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How to use?
&lt;/h3&gt;

&lt;p&gt;To create a multi-model endpoint in Amazon SageMaker, choose the multi-model option, provide the inference serving container image path, and provide the &lt;a href="http://aws.amazon.com/s3" rel="noopener noreferrer"&gt;Amazon S3&lt;/a&gt; prefix in which the trained model artifacts are stored. You can organize your models in S3 any way you wish, so long as they all use the same prefix. &lt;/p&gt;

&lt;p&gt;When you invoke the multi-model endpoint, you provide the relative path of a specific model with the new TargetModel parameter of InvokeEndpoint. To add models to the multi-model endpoint, simply store a newly trained model artifact in S3 under the prefix associated with the endpoint. The model will then be immediately available for invocations. &lt;/p&gt;

&lt;p&gt;To update a model already in use, add the model to S3 with a new name and begin invoking the endpoint with the new model name. To stop using a model deployed on a multi-model endpoint, stop invoking the model and delete it from S3.&lt;/p&gt;

&lt;p&gt;Instead of downloading all the models into the container from S3 when the endpoint is created, Amazon SageMaker multi-model endpoints dynamically load models from S3 when invoked. As a result, an initial invocation to a model might see higher inference latency than the subsequent inferences, which are completed with low latency. If the model is already loaded on the container when invoked, then the download step is skipped and the model returns the inferences with low latency. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sf346dtlj3r28w14wh5.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sf346dtlj3r28w14wh5.gif" alt="im" width="800" height="257"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitoring multi-model endpoints using Amazon CloudWatch metrics
&lt;/h3&gt;

&lt;p&gt;To make price and performance tradeoffs, you will want to test multi-model endpoints with models and representative traffic from your own application. Amazon SageMaker provides additional metrics in CloudWatch for multi-model endpoints so you can determine the endpoint usage and the cache hit rate and optimize your endpoint. The metrics are as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ModelLoadingWaitTime&lt;/strong&gt; – The interval of time that an invocation request waits for the target model to be downloaded or loaded to perform the inference.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ModelUnloadingTime&lt;/strong&gt; – The interval of time that it takes to unload the model through the container’s &lt;code&gt;UnloadModel&lt;/code&gt; API call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ModelDownloadingTime&lt;/strong&gt; – The interval of time that it takes to download the model from S3.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ModelLoadingTime&lt;/strong&gt; – The interval of time that it takes to load the model through the container’s &lt;code&gt;LoadModel&lt;/code&gt; API call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ModelCacheHit&lt;/strong&gt; – The number of &lt;code&gt;InvokeEndpoint&lt;/code&gt; requests sent to the endpoint where the model was already loaded. Taking the Average statistic shows the ratio of requests in which the model was already loaded.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LoadedModelCount&lt;/strong&gt; – The number of models loaded in the containers in the endpoint. This metric is emitted per instance. The &lt;code&gt;Average&lt;/code&gt; statistic with a period of 1 minute tells you the average number of models loaded per instance, and the &lt;code&gt;Sum&lt;/code&gt; statistic tells you the total number of models loaded across all instances in the endpoint. The models that this metric tracks are not necessarily unique because you can load a model in multiple containers in the endpoint.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;You can use CloudWatch charts to help make ongoing decisions on the optimal choice of instance type, instance count, and number of models that a given endpoint should hos&lt;/strong&gt;t. &lt;/p&gt;

&lt;h2&gt;
  
  
  Inference Pipeline sagemaker
&lt;/h2&gt;

&lt;p&gt;You can use trained models in an inference pipeline to make real-time predictions directly without performing external preprocessing. When you configure the pipeline, you can choose to use the built-in feature transformers already available in Amazon SageMaker. Or, you can implement your own transformation logic using just a few lines of scikit-learn or Spark code.&lt;/p&gt;

&lt;p&gt;Refer &lt;a href="https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.html" rel="noopener noreferrer"&gt;https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.html&lt;/a&gt; / &lt;a href="https://catalog.us-east-1.prod.workshops.aws/workshops/f238037c-8f0b-446e-9c15-ebcc4908901a/en-US/002-services/003-machine-learning/020-sagemaker" rel="noopener noreferrer"&gt;https://catalog.us-east-1.prod.workshops.aws/workshops/f238037c-8f0b-446e-9c15-ebcc4908901a/en-US/002-services/003-machine-learning/020-sagemaker&lt;/a&gt; for more details.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Inference pipeline allows you to host multiple models behind a single endpoint. But in this case, the models are sequential chain of models with the steps that are required for inference. This allows you to take your data transformation model, your predictor model, and your post-processing transformer, and host them so they can be sequentially run behind a single endpoint.&lt;/li&gt;
&lt;li&gt; As you can see in this picture, the inference request comes into the endpoint, then the first model is invoked, and that model is your data transformation. The output of that model is then passed to the next step, which is actually your XGBoost model here, or your predictor model. 

&lt;ul&gt;
&lt;li&gt;That output is then passed to the next step, where ultimately in that final step in the pipeline, it provides the final response or  the post-process response to that inference request. &lt;/li&gt;
&lt;li&gt;This allows you to couple your pre and post-processing code behind the same endpoint and helps ensure that your training and your inference code stay synchronized&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77s20por3defocmh57xy.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77s20por3defocmh57xy.PNG" alt="im" width="800" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Sagemaker Production Variant
&lt;/h2&gt;

&lt;p&gt;Amazon SageMaker enables you to test multiple models or model versions behind the same endpoint using production variants. Each production variant identifies a machine learning (ML) model and the resources deployed for hosting the model. By using production variants, you can test ML models that have been trained using different datasets, trained using different algorithms and ML frameworks, or are deployed to different instance type, or any combination of all of these. You can distribute endpoint invocation requests across multiple production variants by providing the traffic distribution for each variant, or you can invoke a specific variant directly for each request. In this topic, we look at both methods for testing ML models.&lt;/p&gt;

&lt;p&gt;Refer the notebook &lt;a href="https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_endpoints/a_b_testing/a_b_testing.html" rel="noopener noreferrer"&gt;https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_endpoints/a_b_testing/a_b_testing.html&lt;/a&gt; for implementation details.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test models by specifying traffic distribution
&lt;/h3&gt;

&lt;p&gt;Specify the percentage of the traffic that gets routed to each model by specifying the weight for each production variant in the endpoint configuration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F370os14y9wo7g1w0nwc3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F370os14y9wo7g1w0nwc3.png" alt="im" width="676" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Test models by invoking specific variants
&lt;/h3&gt;

&lt;p&gt;Specify the specific version of the model you want to invoke by providing a value for the &lt;code&gt;TargetVariant&lt;/code&gt; parameter when you call &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html" rel="noopener noreferrer"&gt;InvokeEndpoint&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftg60eazu4x6xn1p5acvv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftg60eazu4x6xn1p5acvv.png" alt="im" width="693" height="482"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Amazon SageMaker Batch Transform: Batch Inference
&lt;/h1&gt;

&lt;p&gt;We’ll use the Sagemaker Batch Transform Jobs and a trained machine learning model. It is assumed that we have already trained the model, pushed the Docker image to ECR, and registered the model in Sagemaker. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we need the identifier of the Sagemaker model we want to use and the location of the input data&lt;/li&gt;
&lt;li&gt;either use a built-in container for your inference image or you can also bring your own.&lt;/li&gt;
&lt;li&gt;Batch Transform &lt;strong&gt;partitions the Amazon S3 objects in the input by key and maps Amazon S3 objects to instances&lt;/strong&gt;. When you have multiples files, one instance might process input1. csv , and another instance might process the file named input2. csv &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Batch Transform you provide your inference data as a S3 uri and  SageMaker will care of downloading it, running the prediction and  uploading the results afterwards to S3 again. You can find more  documentation for Batch Transform &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you trained a model using the Hugging Face Estimator, call the &lt;code&gt;transformer()&lt;/code&gt; method to create a transform job for a model based on the training job (see &lt;a href="https://sagemaker.readthedocs.io/en/stable/overview.html#sagemaker-batch-transform" rel="noopener noreferrer"&gt;here&lt;/a&gt; for more details): Refer &lt;a href="https://huggingface.co/docs/sagemaker/inference" rel="noopener noreferrer"&gt;https://huggingface.co/docs/sagemaker/inference&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;batch job has &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;instance count&lt;/li&gt;
&lt;li&gt;instance type&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;transform job has&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;data location&lt;/li&gt;
&lt;li&gt;content type
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;batch_job = huggingface_estimator.transformer(
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    strategy='SingleRecord')


batch_job.transform(
    data='s3://s3-uri-to-batch-data',
    content_type='application/json',    
    split_type='Line')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to run your batch transform job later or with a model from the 🤗 Hub, create a &lt;code&gt;HuggingFaceModel&lt;/code&gt; instance and then call the &lt;code&gt;transformer()&lt;/code&gt; method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from sagemaker.huggingface.model import HuggingFaceModel

# Hub model configuration &amp;lt;https://huggingface.co/models&amp;gt;
hub = {
    'HF_MODEL_ID':'distilbert-base-uncased-finetuned-sst-2-english',
    'HF_TASK':'text-classification'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,                                                # configuration for loading model from Hub
   role=role,                                              # IAM role with permissions to create an endpoint
   transformers_version="4.6",                             # Transformers version used
   pytorch_version="1.7",                                  # PyTorch version used
   py_version='py36',                                      # Python version used
)

# create transformer to run a batch job
batch_job = huggingface_model.transformer(
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    output_path=output_s3_path, # we are using the same s3 path to save the output with the input
    strategy='SingleRecord'
)

# starts batch transform job and uses S3 data as input
batch_job.transform(
    data='s3://sagemaker-s3-demo-test/samples/input.jsonl',
    content_type='application/json',    
    split_type='Line'
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;input.jsonl&lt;/code&gt; looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import json
from sagemaker.s3 import S3Downloader
from ast import literal_eval
# creating s3 uri for result file -&amp;gt; input file + .out
output_file = f"{dataset_jsonl_file}.out"
output_path = s3_path_join(output_s3_path,output_file)

# download file
S3Downloader.download(output_path,'.')

batch_transform_result = []
with open(output_file) as f:
    for line in f:
        # converts jsonline array to normal array
        line = "[" + line.replace("[","").replace("]",",") + "]"
        batch_transform_result = literal_eval(line) 

# print results 
print(batch_transform_result[:3])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{"inputs":"this movie is terrible"}
{"inputs":"this movie is amazing"}
{"inputs":"SageMaker is pretty cool"}
{"inputs":"SageMaker is pretty cool"}
{"inputs":"this movie is terrible"}
{"inputs":"this movie is amazing"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;📓 Open the &lt;a href="https://github.com/huggingface/notebooks/blob/main/sagemaker/12_batch_transform_inference/sagemaker-notebook.ipynb" rel="noopener noreferrer"&gt;notebook&lt;/a&gt; for an example of how to run a batch transform job for inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Speeding up the processing
&lt;/h2&gt;

&lt;p&gt;We have only one instance running, so processing the entire file may take some time. We can increase the number of instances using the &lt;code&gt;instance_count&lt;/code&gt; parameter to speed it up. We can send multiple requests to the Docker container simultaneously, too. The configure concurrent transformations we must use the &lt;code&gt;max_concurrent_transforms&lt;/code&gt; parameter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Processing the output
&lt;/h2&gt;

&lt;p&gt;In the end, we must get access to the output. We’ll find the output files in the location specified in the Transformer constructor. Every line contains the prediction and the input parameters. agemaker-notebook.ipynb) for an example of how to run a batch transform job for inference.&lt;/p&gt;

</description>
      <category>motivation</category>
    </item>
  </channel>
</rss>
