<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ismail Kovvuru</title>
    <description>The latest articles on DEV Community by Ismail Kovvuru (@ismailkovvuru).</description>
    <link>https://dev.to/ismailkovvuru</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3255997%2Faed64b9d-6c57-4355-8b07-70bbcdea6603.jpg</url>
      <title>DEV Community: Ismail Kovvuru</title>
      <link>https://dev.to/ismailkovvuru</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ismailkovvuru"/>
    <language>en</language>
    <item>
      <title>AWS S3 Cross-Account Uploads Failing with 403 AccessDenied</title>
      <dc:creator>Ismail Kovvuru</dc:creator>
      <pubDate>Tue, 28 Oct 2025 16:01:32 +0000</pubDate>
      <link>https://dev.to/ismailkovvuru/aws-s3-cross-account-uploads-failing-with-403-accessdenied-20ed</link>
      <guid>https://dev.to/ismailkovvuru/aws-s3-cross-account-uploads-failing-with-403-accessdenied-20ed</guid>
      <description>&lt;p&gt;Learn how a simple missing permission in an AWS S3 Access Point policy caused 403 AccessDenied errors in cross-account uploads, even when IAM roles and bucket policies were correct. Step-by-step fix and prevention guide inside.&lt;/p&gt;

&lt;p&gt;A User saw &lt;code&gt;500 Internal Server Error&lt;/code&gt; on uploads. Tracing showed an S3 &lt;code&gt;AccessDenied (403)&lt;/code&gt; coming from an S3 Access Point that belonged to a different AWS account. &lt;/p&gt;

&lt;p&gt;The bucket policy allowed cross-account writes, but the &lt;em&gt;Access Point policy&lt;/em&gt; did &lt;strong&gt;not&lt;/strong&gt; include the Lambda role ARN — S3 blocked the request.&lt;/p&gt;

&lt;p&gt;Fix: add the source Lambda’s role ARN to the Access Point policy (or use an alternative cross-account pattern). After the Access Point policy was updated, uploads succeeded.&lt;/p&gt;

&lt;p&gt;Below is a clear, step-by-step explanation of &lt;strong&gt;what happened&lt;/strong&gt;, &lt;strong&gt;why it happened&lt;/strong&gt; and &lt;strong&gt;how to fix and prevent&lt;/strong&gt; it written so engineers and managers both can follow.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. what, where, when and how the problem showed up
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What happened (observable symptom):&lt;/strong&gt;&lt;br&gt;
User reported &lt;code&gt;File uploads to S3 are failing&lt;/code&gt;. The user-facing API returned &lt;code&gt;500 Internal Server Error&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where in the system:&lt;/strong&gt;&lt;br&gt;
Uploads flow: Frontend → API Gateway → Lambda (in Account A) → S3 Access Point → S3 bucket (owned by Account B). The Access Point resource was in &lt;strong&gt;another AWS account&lt;/strong&gt; (Account B).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When it happened:&lt;/strong&gt;&lt;br&gt;
At runtime when Lambda attempted to &lt;code&gt;PutObject&lt;/code&gt; through the Access Point to the destination account.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it presented in traces and logs:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Frontend initiated an upload request via API Gateway.&lt;/li&gt;
&lt;li&gt;API Gateway invoked a Lambda function.&lt;/li&gt;
&lt;li&gt;Lambda attempted an S3 PutObject operation.&lt;/li&gt;
&lt;li&gt;S3 returned 403 AccessDenied.&lt;/li&gt;
&lt;li&gt;The upstream API returned 500 Internal Server Error to the user, masking the actual permission failure from S3.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This translation of the 403 into a 500 response made troubleshooting initially misleading.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it mattered / got missed initially:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Team checked: bucket health , IAM role with broad permissions , no network issues.&lt;/li&gt;
&lt;li&gt;The subtlety: S3 Access Points have their &lt;strong&gt;own resource policies&lt;/strong&gt; (separate from bucket policy). Even though the bucket policy allowed cross-account writes, the Access Point policy did &lt;strong&gt;not&lt;/strong&gt; include the Lambda role ARN as a principal — S3 denied the operation at the Access Point layer.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  2. Root cause
&lt;/h2&gt;

&lt;p&gt;S3 Access Point is a resource that can have its &lt;strong&gt;own policy&lt;/strong&gt;. When using an Access Point, S3 enforces both the bucket policy and the Access Point policy. In this case:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The destination bucket’s policy allowed cross-account writes.&lt;/li&gt;
&lt;li&gt;The Access Point policy &lt;strong&gt;did not&lt;/strong&gt; allow the Lambda’s role (principal) from the source account.&lt;/li&gt;
&lt;li&gt;S3 rejected the &lt;code&gt;PutObject&lt;/code&gt; with &lt;code&gt;AccessDenied (403)&lt;/code&gt; at the Access Point layer.&lt;/li&gt;
&lt;li&gt;The Lambda (or API Gateway) didn’t translate that permission error into a meaningful client response, so the client only saw &lt;code&gt;500 Internal Server Error&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Lesson:&lt;/strong&gt; When cross-account operations use S3 Access Points, check the Access Point policy not just the bucket policy or IAM role.&lt;/p&gt;
&lt;h2&gt;
  
  
  3. The exact fix that worked
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Fix performed:&lt;/strong&gt; Update the S3 Access Point policy in the destination account (Account B) to include the source Lambda execution role ARN from Account A as an allowed principal for &lt;code&gt;s3:PutObject&lt;/code&gt; (and other relevant S3 actions).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After redeploy:&lt;/strong&gt; Uploads succeeded and the API returned &lt;code&gt;200 OK&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  4. Concrete commands &amp;amp; policy examples
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; replace account IDs, ARNs, access point names and regions with your own.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  4.1 Inspect current Access Point policy (destination account)
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3control get-access-point-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--account-id&lt;/span&gt; 222233334444 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; app-uploads &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;If there is no policy returned, that itself is relevant: the Access Point may be implicitly more restrictive.&lt;/p&gt;
&lt;h3&gt;
  
  
  4.2 Minimal Access Point policy that allows a Lambda role in another account to PutObject
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;access-point-policy.json&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Sid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AllowLambdaPutFromAccountA"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Principal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"AWS"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:iam::111122223333:role/lambda-exec-role"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"s3:PutObject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"s3:PutObjectAcl"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:s3:us-east-1:222233334444:accesspoint/app-uploads/object/*"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3control put-access-point-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--account-id&lt;/span&gt; 222233334444 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; app-uploads &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy&lt;/span&gt; file://access-point-policy.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.3 Example bucket policy (destination account) that is compatible
&lt;/h3&gt;

&lt;p&gt;A bucket policy can additionally allow writes; Access Point policy must explicitly allow the principal too.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Principal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"AWS"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:iam::111122223333:role/lambda-exec-role"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"s3:PutObject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:s3:::my-bucket/*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Condition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"StringEquals"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"aws:SourceAccount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"111122223333"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.4 Test a PutObject using the Access Point ARN (from source account)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3api put-object &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bucket&lt;/span&gt; arn:aws:s3:us-east-1:222233334444:accesspoint/app-uploads &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--key&lt;/span&gt; test.txt &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--body&lt;/span&gt; ./test.txt &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Expected success: &lt;code&gt;200 OK&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If AccessDenied, confirm both Access Point policy and bucket policy include appropriate principals and conditions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  5. Why the API showed &lt;code&gt;500&lt;/code&gt; instead of &lt;code&gt;403&lt;/code&gt; (and how to avoid masking)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; Lambda got a 403 from S3, but either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lambda code didn't catch/translate the exception and defaulted to an internal error, or&lt;/li&gt;
&lt;li&gt;API Gateway integration mapping converted the Lambda error into a generic 500.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How to avoid masking in future:&lt;/strong&gt; catch S3 errors and return meaningful HTTP status codes to clients.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example (Node.js Lambda) — catch and propagate S3 errors:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;AWS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;AWS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;S3&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nx"&gt;exports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;putObject&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;arn:aws:s3:us-east-1:222233334444:accesspoint/app-uploads&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;file.txt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hello&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;promise&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OK&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AccessDenied&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;statusCode&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;403&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;403&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Upload blocked: Access denied&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Unexpected S3 error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Internal Server Error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Recommendation:&lt;/strong&gt; surface the correct HTTP code for client-facing errors, and log the full S3 error payload for troubleshooting.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. When to use this solution, and when not to
&lt;/h2&gt;

&lt;h2&gt;
  
  
  When to use: include the source role ARN in Access Point policy
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use when you &lt;strong&gt;need direct cross-account writes&lt;/strong&gt; through an Access Point (multi-tenant access patterns, VPC-restricted access, or when Access Points are an architecture requirement).&lt;/li&gt;
&lt;li&gt;Use when Access Points are used to manage fine-grained access to a large bucket by many consumers across accounts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fine-grained control at Access Point level.&lt;/li&gt;
&lt;li&gt;Scopes access specifically to that Access Point and object path (less blast radius than bucket policy alone).&lt;/li&gt;
&lt;li&gt;Works well with Access Point features (VPC restrictions, policy scoping).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cons / Risks:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requires managing principals across accounts — can be error-prone if roles rotate or change names.&lt;/li&gt;
&lt;li&gt;Policies can get complex; need automation for correctness.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When &lt;strong&gt;not&lt;/strong&gt; to use — alternatives
&lt;/h2&gt;

&lt;p&gt;If you don't require Access Points, or cross-account Access Point policy management is heavy for your org, consider alternatives:&lt;/p&gt;

&lt;h3&gt;
  
  
  Alternative A — Cross-account &lt;strong&gt;AssumeRole&lt;/strong&gt; (recommended for programmatic cross-account access)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Create a role in the destination account (Account B) that allows &lt;code&gt;s3:PutObject&lt;/code&gt; on the bucket. Grant Account A permission to assume that role.&lt;/li&gt;
&lt;li&gt;The Lambda in Account A &lt;code&gt;sts:AssumeRole&lt;/code&gt; into the role in Account B and call S3 with temporary credentials. This avoids needing to manage resource policies referencing Account A principals.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When to choose:&lt;/strong&gt; if you prefer IAM role trust relationships and more centralized control; good for service-to-service cross-account interactions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sketch:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In Account B, create &lt;code&gt;S3WriteRole&lt;/code&gt; with bucket &lt;code&gt;PutObject&lt;/code&gt; permissions. Trust policy allows &lt;code&gt;arn:aws:iam::111122223333:role/lambda-exec-role&lt;/code&gt; (or the Account A principal) to assume it.&lt;/li&gt;
&lt;li&gt;In Lambda (Account A), call &lt;code&gt;sts.assumeRole&lt;/code&gt; to get temporary credentials, then call S3.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt; centralized role in destination; simpler to audit.&lt;br&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; adds STS usage and role assumption step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alternative B — Pre-signed URLs
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Generate pre-signed &lt;code&gt;PUT&lt;/code&gt; URL in Account B or in Account A via a role in Account B. Frontend uses that URL to upload directly to S3. No cross-account policy needs on Access Point if signed correctly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When to choose:&lt;/strong&gt; when user uploads from browser/mobile and you want to avoid long-lived credentials or cross-account writes from Lambda.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt; simple client flow, least privilege on server.&lt;br&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; signature management; cannot perform server-side transformations before upload.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alternative C — Use bucket policies (no Access Point)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;If Access Points not required, direct bucket policies that allow cross-account principals might be simpler.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When to choose:&lt;/strong&gt; single-tenant use, or small number of trusted accounts.&lt;/p&gt;

&lt;h2&gt;
  
  
  7 . Diagnostics checklist / playbook (step-by-step)
&lt;/h2&gt;

&lt;p&gt;If an S3 upload fails and you see a 500 or 403, run through this checklist:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Trace the request path&lt;/strong&gt; — which resource did the client actually hit? (Direct bucket ARN, or Access Point ARN?)&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Check Lambda code: what &lt;code&gt;Bucket&lt;/code&gt; value does it pass to S3? If it uses an Access Point ARN, note the account id in that ARN.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Check CloudTrail&lt;/strong&gt; for S3 Data Events (PutObject) to see the exact &lt;code&gt;errorCode&lt;/code&gt;/&lt;code&gt;errorMessage&lt;/code&gt;. CloudTrail shows which principal was used and whether the error was &lt;code&gt;AccessDenied&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Check Access Point policy&lt;/strong&gt; (destination account):&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   aws s3control get-access-point-policy &lt;span class="nt"&gt;--account-id&lt;/span&gt; DEST &lt;span class="nt"&gt;--name&lt;/span&gt; APNAME
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Check bucket policy&lt;/strong&gt; (destination account) for &lt;code&gt;PutObject&lt;/code&gt; allow/deny statements and &lt;code&gt;aws:SourceAccount&lt;/code&gt; or &lt;code&gt;aws:SourceArn&lt;/code&gt; conditions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Confirm the principal&lt;/strong&gt;: ensure the policy includes the Lambda's execution role ARN or the appropriate account principal.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Check Lambda IAM&lt;/strong&gt; (source account): verify Lambda has &lt;code&gt;s3:PutObject&lt;/code&gt; in its IAM permissions for the target resource (if using assumed role pattern, ensure assume role is allowed).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Check VPC / endpoint restrictions&lt;/strong&gt;: Access Points can be restricted to VPCs — confirm the call originates from an allowed place.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reproduce with AWS CLI&lt;/strong&gt; using the same ARN to see exact error:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; arn:aws:s3:us-east-1:DEST:accesspoint/APNAME &lt;span class="nt"&gt;--key&lt;/span&gt; t.txt &lt;span class="nt"&gt;--body&lt;/span&gt; t.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fix policy, then re-test&lt;/strong&gt;. If still failing, enable S3 Server Access Logging or review CloudTrail events.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Avoid masking:&lt;/strong&gt; update Lambda error handling to surface S3 error codes to client and log full stack trace.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  8. Prevention: automation, monitoring &amp;amp; best practices
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Automated checks&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add CI/CD checks that verify Access Point policies and bucket policies include required principals (for cross-account flows). Eg: run &lt;code&gt;aws s3control get-access-point-policy&lt;/code&gt; in a test job that compares expected principals.&lt;/li&gt;
&lt;li&gt;Use infrastructure as code (Terraform/CloudFormation) for Access Points and policies so cross-account principals are reviewed and versioned.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enable S3 Data Events in CloudTrail for sensitive buckets — this records &lt;code&gt;PutObject&lt;/code&gt; and will show &lt;code&gt;AccessDenied&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Enable S3 Server Access Logging on the bucket for additional forensic data.&lt;/li&gt;
&lt;li&gt;Log Lambda exceptions and include error codes – make them searchable in CloudWatch Logs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Policy hygiene&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prefer least privilege: grant only the actions required (&lt;code&gt;s3:PutObject&lt;/code&gt;, possibly &lt;code&gt;s3:PutObjectAcl&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Where possible use &lt;code&gt;Condition&lt;/code&gt; elements (&lt;code&gt;aws:SourceAccount&lt;/code&gt;, &lt;code&gt;aws:SourceArn&lt;/code&gt;) to reduce risk.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Testing&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add integration tests that perform a real PutObject via the Access Point as part of deployment pipelines (run under a sandbox account).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  9. Example: Add assume-role alternative (step by step)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. In destination account (Account B)&lt;/strong&gt; create role &lt;code&gt;CrossAccountS3Writer&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Trust policy grants &lt;code&gt;sts:AssumeRole&lt;/code&gt; to the source account (Account A) or the Lambda role.&lt;/li&gt;
&lt;li&gt;Permissions: &lt;code&gt;s3:PutObject&lt;/code&gt; on &lt;code&gt;arn:aws:s3:::my-bucket/*&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. In source account (Lambda)&lt;/strong&gt; use STS to assume &lt;code&gt;CrossAccountS3Writer&lt;/code&gt; and call S3 using those temporary credentials.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why:&lt;/strong&gt; avoids scattering destination resource policies referencing many source principals; centralizes control in the destination account.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. Short checklist
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Identify if using Access Point ARN or direct bucket ARN.&lt;/li&gt;
&lt;li&gt; Check Access Point policy in destination account.&lt;/li&gt;
&lt;li&gt; Check bucket policy for &lt;code&gt;PutObject&lt;/code&gt; allow/deny.&lt;/li&gt;
&lt;li&gt; Confirm principal ARN (Lambda role) is included in Access Point/bucket policy or that role assumption is configured.&lt;/li&gt;
&lt;li&gt; Reproduce with AWS CLI.&lt;/li&gt;
&lt;li&gt; Fix policy or implement assume-role &amp;amp; retest.&lt;/li&gt;
&lt;li&gt; Add automated tests &amp;amp; CI checks.&lt;/li&gt;
&lt;li&gt; Improve error handling so S3 403 becomes client 403, not 500.&lt;/li&gt;
&lt;li&gt; Document in team KB with example policies.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  11. Recommendations (practical &amp;amp; actionable)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Short term:&lt;/strong&gt; Fix the Access Point policy to include the Lambda role ARN and re-test. Update Lambda to surface 403s clearly. Add a short KB entry referencing this incident.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Medium term:&lt;/strong&gt; Decide a cross-account access pattern for your org (Access Point policy vs assume-role vs presigned URLs). Standardize it and codify with IaC.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Long term:&lt;/strong&gt; Automate policy verification in CI, enable CloudTrail S3 data events for critical buckets, and add integration test coverage that performs a test upload through the exact path used in production.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  12. Conclusion
&lt;/h2&gt;

&lt;p&gt;This was a classic example of &lt;strong&gt;policy layering&lt;/strong&gt; catching teams off guard: even with a permissive bucket policy and a qualified IAM role in the caller account, the Access Point — being a separate resource — enforces its own policy. The symptom (500) masked the true permission error (403) until you traced all hops and checked the Access Point policy.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>api</category>
      <category>discuss</category>
      <category>devops</category>
    </item>
    <item>
      <title>OpenAI Launches ChatGPT Go in India 🇮🇳</title>
      <dc:creator>Ismail Kovvuru</dc:creator>
      <pubDate>Tue, 19 Aug 2025 09:24:47 +0000</pubDate>
      <link>https://dev.to/ismailkovvuru/openai-launches-chatgpt-go-in-india-1ggh</link>
      <guid>https://dev.to/ismailkovvuru/openai-launches-chatgpt-go-in-india-1ggh</guid>
      <description>&lt;p&gt;&lt;em&gt;OpenAI launches ChatGPT Go in India — a new affordable subscription plan with 10x more messages, images, and uploads, plus UPI payments. Available now for just ₹399.&lt;/em&gt;  &lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing ChatGPT Go in India
&lt;/h2&gt;

&lt;p&gt;OpenAI is excited to announce the launch of &lt;strong&gt;ChatGPT Go&lt;/strong&gt; in India, a brand-new subscription tier designed to make advanced AI features more accessible and affordable.  &lt;/p&gt;

&lt;p&gt;Nick Turley, Head of ChatGPT, shared the news of ChatGPT Go’s launch in India via LinkedIn, highlighting OpenAI’s commitment to making advanced AI more affordable and accessible. Here what he shared about news..&lt;/p&gt;

&lt;p&gt;India is one of the fastest-growing communities of ChatGPT users in the world. From students and professionals to creators and entrepreneurs, people are using ChatGPT every day to &lt;strong&gt;learn, write, design, and build&lt;/strong&gt;. One of the top requests from our Indian users has been: &lt;em&gt;“Can you make premium ChatGPT features more affordable and easier to subscribe to locally?”&lt;/em&gt;  &lt;/p&gt;

&lt;p&gt;With &lt;strong&gt;ChatGPT Go&lt;/strong&gt;, we are delivering on that request.  &lt;/p&gt;

&lt;h2&gt;
  
  
  What ChatGPT Go Offers
&lt;/h2&gt;

&lt;p&gt;Subscribers of ChatGPT Go in India can now enjoy major enhancements compared with the free tier:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;10x higher message limits&lt;/strong&gt; – have extended conversations for deeper learning and projects.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;10x more image generations&lt;/strong&gt; – create visuals, mockups, and ideas without worrying about running out.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;10x more file uploads&lt;/strong&gt; – work seamlessly with larger and more complex documents.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2x longer memory&lt;/strong&gt; – experience smarter, more personalized interactions, with ChatGPT remembering more across sessions.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And all of this is available at just &lt;strong&gt;₹399 per month&lt;/strong&gt;.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Easier Subscriptions with INR and UPI
&lt;/h2&gt;

&lt;p&gt;To make the process seamless, we’ve localized ChatGPT subscriptions in India:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All plans are now priced in &lt;strong&gt;Indian Rupees (INR)&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Payments can be made conveniently through &lt;strong&gt;UPI&lt;/strong&gt;, India’s most widely used digital payment system.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ensures that upgrading is as simple as sending a UPI payment — no international barriers, no complexity.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Why We Chose India First
&lt;/h2&gt;

&lt;p&gt;We’re launching ChatGPT Go in India first because of the country’s vibrant AI adoption. Millions of people here use ChatGPT for &lt;strong&gt;learning, productivity, and creativity&lt;/strong&gt;, and affordability has been one of the most requested improvements.  &lt;/p&gt;

&lt;p&gt;Starting in India allows us to learn directly from this growing community before expanding the plan to other countries.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Our Commitment
&lt;/h2&gt;

&lt;p&gt;OpenAI’s mission is to make AI &lt;strong&gt;useful, safe, and accessible&lt;/strong&gt; to as many people as possible. The launch of ChatGPT Go in India is part of that vision, giving more power and flexibility to users who want to do more with AI — at a price that works locally.  &lt;/p&gt;

&lt;p&gt;We look forward to hearing from our Indian community, learning from feedback, and continuing to improve ChatGPT for everyone.  &lt;/p&gt;

</description>
      <category>openai</category>
      <category>chatgpt</category>
      <category>discuss</category>
      <category>ai</category>
    </item>
    <item>
      <title>AWS Lambda Response Streaming Now Supports 200 MB Payloads — 10 More!</title>
      <dc:creator>Ismail Kovvuru</dc:creator>
      <pubDate>Tue, 12 Aug 2025 08:25:41 +0000</pubDate>
      <link>https://dev.to/ismailkovvuru/aws-lambda-response-streaming-now-supports-200-mb-payloads-10x-more-2o0o</link>
      <guid>https://dev.to/ismailkovvuru/aws-lambda-response-streaming-now-supports-200-mb-payloads-10x-more-2o0o</guid>
      <description>&lt;p&gt;&lt;strong&gt;Big news for serverless developers&lt;/strong&gt; — AWS Lambda’s &lt;strong&gt;response streaming limit&lt;/strong&gt; has jumped from &lt;strong&gt;20 MB → 200 MB&lt;/strong&gt; by default.&lt;br&gt;&lt;br&gt;
That’s a &lt;strong&gt;10× improvement&lt;/strong&gt;, and it means &lt;em&gt;fewer workarounds, less infrastructure, and simpler code&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed?
&lt;/h2&gt;

&lt;p&gt;Previously, if your Lambda needed to return more than 20 MB (e.g., PDFs, images, analytics dumps, AI output), you had to:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Split or chunk the payload,
&lt;/li&gt;
&lt;li&gt;Compress it, or
&lt;/li&gt;
&lt;li&gt;Upload it to &lt;strong&gt;S3&lt;/strong&gt; and return a presigned URL.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That added &lt;em&gt;more code, higher latency, and extra moving parts&lt;/em&gt;.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
You can stream up to &lt;strong&gt;200 MB directly&lt;/strong&gt; from Lambda to your client.&lt;br&gt;&lt;br&gt;
No S3 handoff. No chunking. Just stream data as it’s ready.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Facts
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Limit Type&lt;/th&gt;
&lt;th&gt;New (Streaming)&lt;/th&gt;
&lt;th&gt;Old&lt;/th&gt;
&lt;th&gt;Applies To&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Response Size (Streaming)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;200 MB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;20 MB&lt;/td&gt;
&lt;td&gt;Node.js managed &amp;amp; custom runtimes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Response Size (Classic)&lt;/td&gt;
&lt;td&gt;6 MB&lt;/td&gt;
&lt;td&gt;6 MB&lt;/td&gt;
&lt;td&gt;Buffered responses only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input Payload Size&lt;/td&gt;
&lt;td&gt;6 MB&lt;/td&gt;
&lt;td&gt;6 MB&lt;/td&gt;
&lt;td&gt;Still capped&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The &lt;strong&gt;6 MB input limit&lt;/strong&gt; still exists for requests into Lambda.&lt;br&gt;&lt;br&gt;
For uploads, use S3 presigned URLs or direct integrations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;This change simplifies architectures for:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Media Delivery&lt;/strong&gt; — Serve full podcast episodes, videos, and big PDFs right from Lambda.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generative AI&lt;/strong&gt; — Stream long text, images, or audio without waiting for completion.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data-heavy APIs&lt;/strong&gt; — Deliver large CSVs, reports, or datasets without extra handling.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because it’s &lt;strong&gt;streaming&lt;/strong&gt;, users start receiving data immediately — improving &lt;em&gt;Time-to-First-Byte (TTFB)&lt;/em&gt; and responsiveness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Developer Tips
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Use Streaming When:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Responses are large and sequential.&lt;/li&gt;
&lt;li&gt;Processing produces output progressively (AI or analytics).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Mind the Input Limit:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For uploads &amp;gt; 6 MB, use S3 presigned URLs, API Gateway → S3.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Runtime Support:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Works out-of-the-box in &lt;strong&gt;Node.js managed runtimes&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Other languages require a &lt;strong&gt;custom runtime&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  In Summary
&lt;/h2&gt;

&lt;p&gt;AWS Lambda’s new &lt;strong&gt;200 MB response streaming&lt;/strong&gt; means:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Less code
&lt;/li&gt;
&lt;li&gt;Lower latency
&lt;/li&gt;
&lt;li&gt;Fewer AWS services to manage
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most big-response use cases, Lambda can now be a &lt;strong&gt;direct delivery engine&lt;/strong&gt; — no buckets, no detours. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; Combine with CloudFront or edge-optimized APIs for &lt;em&gt;blazing global performance&lt;/em&gt;.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>lambda</category>
      <category>discuss</category>
    </item>
    <item>
      <title>AWS Lambda Now Supports Avro for Kafka – No More Manual Deserialization</title>
      <dc:creator>Ismail Kovvuru</dc:creator>
      <pubDate>Thu, 31 Jul 2025 04:11:42 +0000</pubDate>
      <link>https://dev.to/ismailkovvuru/aws-lambda-now-supports-avro-for-kafka-no-more-manual-deserialization-3kpn</link>
      <guid>https://dev.to/ismailkovvuru/aws-lambda-now-supports-avro-for-kafka-no-more-manual-deserialization-3kpn</guid>
      <description>&lt;p&gt;AWS Lambda now natively supports &lt;strong&gt;Avro&lt;/strong&gt; serialization when consuming messages from &lt;strong&gt;Kafka (MSK or self-managed)&lt;/strong&gt;. No more custom deserialization, external Avro libraries, or complex schema plumbing — just plug in your schema and go.&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s New?
&lt;/h3&gt;

&lt;p&gt;AWS has introduced &lt;strong&gt;built-in Avro support&lt;/strong&gt; for Lambda when triggered by &lt;strong&gt;Kafka&lt;/strong&gt; topics. Previously, using Avro meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Writing &lt;strong&gt;custom decoding code&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Including &lt;strong&gt;&lt;code&gt;avro-python3&lt;/code&gt;, &lt;code&gt;fastavro&lt;/code&gt;, or Java Avro libs&lt;/strong&gt; in your Lambda package&lt;/li&gt;
&lt;li&gt;Handling &lt;strong&gt;schema registry integration&lt;/strong&gt; manually&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Slower cold starts&lt;/strong&gt; due to large deployment bundles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, Lambda does all this &lt;strong&gt;automatically&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Just point it at your Kafka topic&lt;/li&gt;
&lt;li&gt;Provide a schema reference (e.g., AWS Glue Schema Registry)&lt;/li&gt;
&lt;li&gt;Lambda will decode the Avro-encoded payload &lt;strong&gt;natively&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example Setup (Minimal Configuration)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"eventSource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"kafka"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"avroSchema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:glue:us-east-1:123456789012:schema/TelemetryEvent"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Lambda Code (Python Example)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;records&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Decoded payload:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  &lt;span class="c1"&gt;# Avro already parsed
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why This Is a Big Deal
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Benefits for DevOps &amp;amp; Engineers
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Old Workflow&lt;/th&gt;
&lt;th&gt;New Native Support&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Custom Avro decoding logic&lt;/td&gt;
&lt;td&gt;Built-in decoding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extra library dependencies&lt;/td&gt;
&lt;td&gt;None needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bigger Lambda package&lt;/td&gt;
&lt;td&gt;Smaller, faster function&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manual schema registry calls&lt;/td&gt;
&lt;td&gt;Automatic schema parsing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Real-World Use Cases
&lt;/h3&gt;

&lt;p&gt;This update is &lt;strong&gt;critical for teams&lt;/strong&gt; dealing with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High-volume &lt;strong&gt;real-time Kafka pipelines&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IoT event ingestion&lt;/strong&gt; and telemetry&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Financial systems&lt;/strong&gt; using Avro schemas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Microservices&lt;/strong&gt; with schema-first contracts&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Supported Schema Registries
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AWS Glue Schema Registry&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Confluent Schema Registry&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Versioning&lt;/li&gt;
&lt;li&gt;Compatibility (backward, forward, full)&lt;/li&gt;
&lt;li&gt;Schema evolution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So your Lambda function always works with the latest compatible schema version.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caveats &amp;amp; Best Practices
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Messages &lt;strong&gt;must be Avro-encoded&lt;/strong&gt; and registered in a supported registry.&lt;/li&gt;
&lt;li&gt;Stick to &lt;strong&gt;schema compatibility rules&lt;/strong&gt; to avoid decoding failures.&lt;/li&gt;
&lt;li&gt;This feature is &lt;strong&gt;specific to Kafka triggers&lt;/strong&gt; — not available for SQS, Kinesis, or DynamoDB streams.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  📚 Official AWS Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/with-msk.html" rel="noopener noreferrer"&gt;Lambda + Kafka Event Source Mapping Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/glue/latest/dg/schema-registry.html" rel="noopener noreferrer"&gt;AWS Glue Schema Registry Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.confluent.io/platform/current/schema-registry/" rel="noopener noreferrer"&gt;Confluent Schema Registry (for hybrid setups)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/compute/introducing-aws-lambda-native-support-for-avro-and-protobuf-formatted-apache-kafka-events/" rel="noopener noreferrer"&gt;AWS Lambda native support for Avro&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Thoughts
&lt;/h2&gt;

&lt;p&gt;This is a &lt;strong&gt;low-profile but high-impact update&lt;/strong&gt; from AWS. For teams working with &lt;strong&gt;real-time, schema-driven data flows&lt;/strong&gt;, this makes Lambda &lt;strong&gt;more production-ready&lt;/strong&gt; and &lt;strong&gt;DevOps-friendly&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>kafka</category>
      <category>cloud</category>
      <category>devops</category>
    </item>
    <item>
      <title>Kubernetes App Slow? Fix DNS, Mesh &amp; Caching.Not Node Scaling</title>
      <dc:creator>Ismail Kovvuru</dc:creator>
      <pubDate>Wed, 30 Jul 2025 03:10:40 +0000</pubDate>
      <link>https://dev.to/ismailkovvuru/kubernetes-app-slow-fix-dns-mesh-cachingnot-node-scaling-4amj</link>
      <guid>https://dev.to/ismailkovvuru/kubernetes-app-slow-fix-dns-mesh-cachingnot-node-scaling-4amj</guid>
      <description>&lt;p&gt;A production Kubernetes application started showing latency issues during peak hours. User reports flagged slow page loads and inconsistent response times.&lt;/p&gt;

&lt;p&gt;The initial reaction from the infrastructure team was to add more nodes to the cluster. However, before provisioning additional compute resources, a deeper inspection was performed.&lt;br&gt;
But throwing compute at a latency issue is inefficient and costly.&lt;/p&gt;
&lt;h2&gt;
  
  
  Root Causes Identified:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Too many service hops&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CoreDNS misconfigurations&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No caching for repeated API calls&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Real Solutions (Not More Nodes)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. &lt;strong&gt;Use a Service Mesh&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Why:&lt;/strong&gt;&lt;br&gt;
Service meshes like &lt;strong&gt;Istio&lt;/strong&gt; or &lt;strong&gt;Linkerd&lt;/strong&gt; reduce latency by enabling intelligent routing, retries, timeouts, and circuit breaking — optimizing pod-to-pod communication.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Commands (Istio example):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Istio&lt;/span&gt;
istioctl &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;profile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;demo &lt;span class="nt"&gt;-y&lt;/span&gt;

&lt;span class="c"&gt;# Enable automatic sidecar injection&lt;/span&gt;
kubectl label namespace default istio-injection&lt;span class="o"&gt;=&lt;/span&gt;enabled

&lt;span class="c"&gt;# Deploy your app with mesh support&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; your-app-deployment.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. &lt;strong&gt;Fix CoreDNS Configuration&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Why:&lt;/strong&gt;&lt;br&gt;
Misconfigured CoreDNS leads to excessive lookups, especially if &lt;code&gt;upstream&lt;/code&gt;/&lt;code&gt;loop&lt;/code&gt; plugins are misused or timeouts are high.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Steps:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inspect CoreDNS logs:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system &lt;span class="nt"&gt;-l&lt;/span&gt; k8s-app&lt;span class="o"&gt;=&lt;/span&gt;kube-dns
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Edit CoreDNS ConfigMap:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl edit configmap coredns &lt;span class="nt"&gt;-n&lt;/span&gt; kube-system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Optimizations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set appropriate TTLs:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  cache 30
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;ul&gt;
&lt;li&gt;Minimize &lt;code&gt;forward&lt;/code&gt; retries:
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  forward . /etc/resolv.conf {
    max_concurrent 1000
  }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Add Caching for Repeated API Calls&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Why:&lt;/strong&gt;&lt;br&gt;
If microservices make repeated calls to the same APIs (e.g., auth, config, pricing), caching avoids redundant processing and DNS lookups.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Options:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In-app memory cache (&lt;code&gt;LRU&lt;/code&gt;, &lt;code&gt;Redis&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Sidecar caching with tools like &lt;strong&gt;Varnish&lt;/strong&gt; or &lt;strong&gt;NGINX&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example using Redis:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Python Flask example
&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StrictRedis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;redis&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6379&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/get-price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_price&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;
    &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_price_from_db&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why Not Add Nodes?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Slowness here is due to &lt;strong&gt;latency&lt;/strong&gt;, not &lt;strong&gt;resource exhaustion&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Adding nodes increases cost without resolving the actual bottlenecks.&lt;/li&gt;
&lt;li&gt;Smart tuning of networking and caching brings greater results for less overhead.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Only These Solutions?
&lt;/h2&gt;

&lt;p&gt;These three changes gave maximum impact with minimal cost:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Issue                     | Solution       | Reason Chosen                           |
| ------------------------- | -------------- | --------------------------------------- |
| Excessive pod-to-pod hops | Service Mesh   | Centralized control + efficient routing |
| DNS resolution delays     | CoreDNS tuning | Reduced lookup overhead                 |
| Repeated API calls        | API Caching    | Faster responses + reduced backend load |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Are There Better Alternatives?
&lt;/h2&gt;

&lt;p&gt;Other options like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Upgrading to &lt;strong&gt;Cilium&lt;/strong&gt; for eBPF-based networking.&lt;/li&gt;
&lt;li&gt;Using &lt;strong&gt;Headless Services&lt;/strong&gt; to bypass kube-proxy.&lt;/li&gt;
&lt;li&gt;Tuning &lt;strong&gt;Kube-proxy&lt;/strong&gt;, reducing &lt;code&gt;iptables&lt;/code&gt; hops.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But those are deeper infra-level changes. For most real-world apps, the &lt;strong&gt;mesh + DNS fix + caching&lt;/strong&gt; strategy solves 80% of latency complaints &lt;strong&gt;without scaling costs&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Always Measure Before Scaling
&lt;/h2&gt;

&lt;p&gt;Before scaling compute nodes, check usage metrics:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl top pods --all-namespaces

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Final :
&lt;/h2&gt;

&lt;p&gt;Before scaling your Kubernetes cluster, &lt;strong&gt;optimize what you already have&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Service mesh for communication efficiency&lt;/li&gt;
&lt;li&gt;CoreDNS tuning to reduce DNS latency&lt;/li&gt;
&lt;li&gt;Caching to eliminate repetitive calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are &lt;strong&gt;network-aware, cost-effective, and production-ready&lt;/strong&gt; solutions that bring measurable performance improvements.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>networking</category>
      <category>cloud</category>
      <category>aws</category>
    </item>
    <item>
      <title>AWS Docs MCP Server for DevOps Assistance</title>
      <dc:creator>Ismail Kovvuru</dc:creator>
      <pubDate>Tue, 29 Jul 2025 05:00:41 +0000</pubDate>
      <link>https://dev.to/ismailkovvuru/aws-docs-mcp-server-for-devops-assistance-1bo4</link>
      <guid>https://dev.to/ismailkovvuru/aws-docs-mcp-server-for-devops-assistance-1bo4</guid>
      <description>&lt;h2&gt;
  
  
  Using AWS Docs MCP Server for Accurate DevOps Assistance
&lt;/h2&gt;

&lt;p&gt;As DevOps engineers, we constantly search for &lt;strong&gt;precise AWS documentation&lt;/strong&gt; while configuring services like EC2, IAM, or Lambda. However, general-purpose AI tools often hallucinate, and navigating docs manually slows us down.&lt;/p&gt;

&lt;p&gt;Enter the &lt;strong&gt;AWS Docs MCP Server&lt;/strong&gt; — a minimal, reliable tool from &lt;strong&gt;AWS Labs&lt;/strong&gt; that lets you query documentation locally with &lt;strong&gt;zero hallucination&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Whether you're integrating it with &lt;strong&gt;Amazon Q&lt;/strong&gt;, using it inside &lt;strong&gt;VS Code&lt;/strong&gt;, or embedding it into CLI tooling, this server returns &lt;strong&gt;direct responses from official AWS docs&lt;/strong&gt;, fast and accurately.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is AWS Docs MCP Server?
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Managed Control Plane (MCP) Server&lt;/strong&gt; is an open-source server provided by AWS Labs. It's designed to return &lt;strong&gt;real-time AWS documentation&lt;/strong&gt; in response to structured requests.&lt;/p&gt;

&lt;p&gt;It’s especially useful when paired with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;Amazon Q Developer Agent&lt;/strong&gt; in VS Code&lt;/li&gt;
&lt;li&gt; Custom CLI tools&lt;/li&gt;
&lt;li&gt; IDE integrations or doc-bots&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of relying on generative models, it gives back only what's in the AWS documentation — &lt;strong&gt;no more guessing&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Set It Up
&lt;/h2&gt;

&lt;p&gt;You can configure the server using this JSON block (used in tools like Amazon Q):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"awslabs.aws-documentation-mcp-server"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uvx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"awslabs.aws-documentation-mcp-server@latest"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"FASTMCP_LOG_LEVEL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ERROR"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"AWS_DOCUMENTATION_PARTITION"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aws"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"disabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"autoApprove"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What This Script Does
&lt;/h2&gt;

&lt;p&gt;Let’s break it down:&lt;/p&gt;

&lt;h3&gt;
  
  
  What it does:
&lt;/h3&gt;

&lt;p&gt;This script configures your system (or dev assistant like Amazon Q) to &lt;strong&gt;connect to the AWS Docs MCP Server&lt;/strong&gt;, which will respond to queries using &lt;strong&gt;real AWS documentation&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why we're doing it:
&lt;/h3&gt;

&lt;p&gt;Because we want &lt;strong&gt;factual, fast, non-AI-generated answers&lt;/strong&gt; when we ask questions about AWS services — especially for DevOps use cases like configuring IAM policies, EC2 launch templates, or CloudFormation syntax.&lt;/p&gt;

&lt;h3&gt;
  
  
  What output you’ll get:
&lt;/h3&gt;

&lt;p&gt;Once active, this server returns structured responses like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The exact syntax or example for a specific AWS CLI command&lt;/li&gt;
&lt;li&gt;JSON or YAML config snippets from AWS docs&lt;/li&gt;
&lt;li&gt;Official links and metadata from AWS documentation&lt;/li&gt;
&lt;li&gt;Answers scoped only to &lt;strong&gt;actual AWS services&lt;/strong&gt;, nothing made up&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Explanation of Each Field
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;What it Means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;command&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Runs the MCP server using &lt;code&gt;uvx&lt;/code&gt; (a runtime like Deno or Node).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;args&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Downloads and runs the latest version of the AWS Docs MCP Server.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;env.FASTMCP_LOG_LEVEL&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sets log level to &lt;code&gt;ERROR&lt;/code&gt; (suppress warnings/info logs).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;env.AWS_DOCUMENTATION_PARTITION&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Specifies that it should fetch documentation only for the public AWS partition.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;disabled&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;When &lt;code&gt;false&lt;/code&gt;, keeps the server active.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;autoApprove&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Used to control whether requests are auto-approved (leave empty for manual).&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Why It Matters for DevOps
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benefit&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;No hallucination&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data is pulled directly from AWS docs — no assumptions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fast&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Lightweight server optimized for local/dev workflows.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pluggable&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Easily integrates with IDEs, terminals, or dev assistants like Amazon Q.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Expandable&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Labs also provides MCPs for other services (like DynamoDB, CloudWatch, etc.).&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Glossary of Terms Used
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed Control Plane — a server that responds to structured dev tool queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Amazon Q&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS’s AI-powered coding assistant (like Copilot or ChatGPT but AWS-specific)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;uvx&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A JavaScript/TypeScript runtime used to execute MCP servers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Partition&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Refers to the AWS partition (e.g., &lt;code&gt;aws&lt;/code&gt;, &lt;code&gt;aws-cn&lt;/code&gt;, &lt;code&gt;aws-us-gov&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FASTMCP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The lightweight framework powering these control plane servers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hallucination&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;When AI generates false or made-up information&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;autoApprove&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Controls automatic acceptance of prompts by MCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS Labs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS’s GitHub organization for experimental/open-source tools&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Quick Start Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Example Usage with Amazon Q: &lt;a href="https://awslabs.github.io/mcp/servers/aws-documentation-mcp-server/" rel="noopener noreferrer"&gt;AWS MCP Servers&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Thoughts
&lt;/h2&gt;

&lt;p&gt;If you're building or maintaining AWS infrastructure, you &lt;strong&gt;need reliable answers fast&lt;/strong&gt;. This tool gives you that — straight from the source.&lt;/p&gt;

&lt;p&gt;Whether you're writing CloudFormation, troubleshooting S3 policies, or scripting with Boto3, the AWS Docs MCP Server becomes a &lt;strong&gt;trusted backend&lt;/strong&gt; that supercharges your DevOps workflows.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tried it out?&lt;/strong&gt; Share your integrations or feedback in the comments!&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>mcp</category>
      <category>ai</category>
    </item>
    <item>
      <title>Using Amazon EFS with AWS Lambda: Persistent File Storage</title>
      <dc:creator>Ismail Kovvuru</dc:creator>
      <pubDate>Sun, 27 Jul 2025 16:07:32 +0000</pubDate>
      <link>https://dev.to/ismailkovvuru/using-amazon-efs-with-aws-lambda-persistent-file-storage-7i7</link>
      <guid>https://dev.to/ismailkovvuru/using-amazon-efs-with-aws-lambda-persistent-file-storage-7i7</guid>
      <description>&lt;p&gt;Unlock persistent, low-latency storage in AWS Lambda using Amazon EFS. This 2025-ready guide covers real-world use cases, step-by-step Terraform and CloudFormation examples, performance tuning, cost comparisons (vs S3, /tmp, Elasticache), and DevOps best practices for scalable serverless architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS Lambda + EFS: Scalable File Storage for Serverless Workloads
&lt;/h2&gt;

&lt;p&gt;When we think of AWS Lambda, we often imagine stateless, short-lived functions with tight constraints on storage and memory. But what if your function needs to read or write persistent data across multiple invocations? Enter &lt;strong&gt;Amazon EFS (Elastic File System)&lt;/strong&gt; — AWS’s fully managed NFS solution that can be mounted directly to your Lambda functions within a VPC.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means in Simple Terms
&lt;/h2&gt;

&lt;p&gt;By mounting EFS to your Lambda function, you unlock:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;Persistent storage&lt;/strong&gt; — survives between invocations&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Shared access&lt;/strong&gt; — multiple Lambdas, containers, and EC2s can access the same filesystem&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Large file support&lt;/strong&gt; — process GB-level datasets, ML models, PDFs, images, and more&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Faster ML inference or image manipulation&lt;/strong&gt; — with mounted models or binaries&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why /tmp Isn’t Enough in Lambda
&lt;/h2&gt;

&lt;p&gt;Lambda’s /tmp directory:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is ephemeral — data is wiped once the execution environment is reclaimed&lt;/li&gt;
&lt;li&gt;Has a hard 512MB size limit&lt;/li&gt;
&lt;li&gt;Cannot be shared across Lambda instances or invocations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your application needs persistent storage, multi-function collaboration, or handling large files, /tmp becomes a serious bottleneck.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Is Amazon EFS?&lt;/strong&gt;&lt;br&gt;
Amazon EFS is a fully managed, elastic, network file system accessible from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EC2&lt;/li&gt;
&lt;li&gt;ECS/Fargate&lt;/li&gt;
&lt;li&gt;Lambda (via VPC &amp;amp; access point)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With EFS:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can store unlimited files with standard POSIX permissions&lt;/li&gt;
&lt;li&gt;Mount the same filesystem across services&lt;/li&gt;
&lt;li&gt;Pay only for what you use (GB/month + I/O if using provisioned mode)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Why Use Amazon EFS with Lambda?
&lt;/h2&gt;

&lt;p&gt;EFS makes serverless Lambda functions stateful and collaborative. Key advantages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Persistent Storage – Data stays even after Lambda shuts down&lt;/li&gt;
&lt;li&gt;Large File Support – No 512MB cap like /tmp&lt;/li&gt;
&lt;li&gt;Shared Access – Share data across multiple Lambda functions and invocations&lt;/li&gt;
&lt;li&gt;Zero Manual Scaling – Automatically grows with usage&lt;/li&gt;
&lt;li&gt;POSIX File Permissions – Secure multi-tenant file access&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Cost Comparison: EFS vs S3 vs /tmp vs ElastiCache
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Storage Type&lt;/th&gt;
&lt;th&gt;Pricing&lt;/th&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EFS (Standard)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~\$0.30/GB/month + I/O (provisioned if enabled)&lt;/td&gt;
&lt;td&gt;Scalable, shared, persistent, POSIX&lt;/td&gt;
&lt;td&gt;Latency &amp;gt; S3, costlier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;S3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~\$0.023/GB/month&lt;/td&gt;
&lt;td&gt;Durable, cheap, static hosting&lt;/td&gt;
&lt;td&gt;Not writable by Lambda directly without SDK&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lambda &lt;code&gt;/tmp&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free up to 512MB&lt;/td&gt;
&lt;td&gt;Fastest, local&lt;/td&gt;
&lt;td&gt;Ephemeral, size-limited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ElastiCache&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~\$0.02/GB/hour&lt;/td&gt;
&lt;td&gt;Low-latency, real-time caching&lt;/td&gt;
&lt;td&gt;In-memory only, not persistent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  How to Use EFS with Lambda – Step-by-Step
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Existing Lambda function&lt;/li&gt;
&lt;li&gt;VPC with private subnets&lt;/li&gt;
&lt;li&gt;EFS in the same region and VPC&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Setup Steps&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create EFS File System&lt;/li&gt;
&lt;li&gt;Create EFS Access Point&lt;/li&gt;
&lt;li&gt;Configure Mount Targets (across AZs)&lt;/li&gt;
&lt;li&gt;Update Security Groups (allow NFS from Lambda to EFS)&lt;/li&gt;
&lt;li&gt;Attach Lambda to VPC &amp;amp; Mount via Access Point&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Python Sample (Writing to EFS):
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;with open("/mnt/efs/mylog.txt", "a") as f:
    f.write("Function invoked at: {}\n".format(datetime.now()))

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Real-World Use Case
&lt;/h2&gt;

&lt;p&gt;Let’s say you have a &lt;strong&gt;startup running ML inference via Lambda&lt;/strong&gt;. The ML model file is 400MB. Storing it in &lt;code&gt;/tmp&lt;/code&gt; (limited to 512MB) or S3 (slower, not POSIX-compliant) doesn’t scale. With EFS mounted, your Lambda loads the model instantly from the shared file system — improving cold start times and inference speed drastically.&lt;/p&gt;
&lt;h2&gt;
  
  
  When Should You Use Lambda with EFS?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Should Use EFS?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ML model loading&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image/Video rendering (FFmpeg, PIL)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data sharing across Lambdas&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large file access during function&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Small, stateless functions&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Low-latency DB-style caching&lt;/td&gt;
&lt;td&gt;Prefer ElastiCache or DynamoDB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Cost Comparison: EFS vs S3 vs /tmp vs ElastiCache
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;EFS&lt;/th&gt;
&lt;th&gt;S3&lt;/th&gt;
&lt;th&gt;/tmp&lt;/th&gt;
&lt;th&gt;ElastiCache&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Persistent&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Shared&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;POSIX-Compliant&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max File Size&lt;/td&gt;
&lt;td&gt;∞&lt;/td&gt;
&lt;td&gt;∞&lt;/td&gt;
&lt;td&gt;512MB&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speed&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Slower&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pricing Model&lt;/td&gt;
&lt;td&gt;Per GB/month + I/O&lt;/td&gt;
&lt;td&gt;Per request + storage&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Per node/hour&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Reference&lt;/strong&gt;: &lt;a href="https://aws.amazon.com/efs/pricing/" rel="noopener noreferrer"&gt;EFS Pricing&lt;/a&gt;, &lt;a href="https://aws.amazon.com/s3/pricing/" rel="noopener noreferrer"&gt;S3 Pricing&lt;/a&gt;, &lt;a href="https://aws.amazon.com/elasticache/pricing/" rel="noopener noreferrer"&gt;ElastiCache Pricing&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Terraform Script to Mount EFS with Lambda
&lt;/h2&gt;

&lt;p&gt;Here’s a production-ready snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Create a VPC (or use an existing one)&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_vpc"&lt;/span&gt; &lt;span class="s2"&gt;"main"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;cidr_block&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"10.0.0.0/16"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# EFS File System&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_efs_file_system"&lt;/span&gt; &lt;span class="s2"&gt;"lambda_efs"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;performance_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"generalPurpose"&lt;/span&gt;
  &lt;span class="nx"&gt;lifecycle_policy&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;transition_to_ia&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"AFTER_7_DAYS"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;throughput_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"bursting"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Create Mount Targets in each AZ subnet&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_efs_mount_target"&lt;/span&gt; &lt;span class="s2"&gt;"efs_mt"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;for_each&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;toset&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s2"&gt;"subnet-az1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"subnet-az2"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  &lt;span class="c1"&gt;# Replace with real subnet IDs&lt;/span&gt;
  &lt;span class="nx"&gt;file_system_id&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_efs_file_system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lambda_efs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;subnet_id&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;
  &lt;span class="nx"&gt;security_groups&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;efs_sg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Security Group for EFS (Allow NFS)&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_security_group"&lt;/span&gt; &lt;span class="s2"&gt;"efs_sg"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;ingress&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;from_port&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2049&lt;/span&gt;
    &lt;span class="nx"&gt;to_port&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2049&lt;/span&gt;
    &lt;span class="nx"&gt;protocol&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"tcp"&lt;/span&gt;
    &lt;span class="nx"&gt;cidr_blocks&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"10.0.0.0/16"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Lambda Function Role&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"lambda_exec"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"lambda-exec-role"&lt;/span&gt;
  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"sts:AssumeRole"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;Principal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Service&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"lambda.amazonaws.com"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Lambda Function&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_lambda_function"&lt;/span&gt; &lt;span class="s2"&gt;"efs_lambda"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;function_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"efs_lambda_demo"&lt;/span&gt;
  &lt;span class="nx"&gt;runtime&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"python3.9"&lt;/span&gt;
  &lt;span class="nx"&gt;handler&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"index.handler"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lambda_exec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
  &lt;span class="nx"&gt;filename&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"lambda-deploy.zip"&lt;/span&gt;

  &lt;span class="nx"&gt;vpc_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;subnet_ids&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"subnet-az1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"subnet-az2"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;security_group_ids&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;aws_security_group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;efs_sg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;file_system_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;arn&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_efs_access_point&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
    &lt;span class="nx"&gt;local_mount_path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/mnt/efs"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# EFS Access Point&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_efs_access_point"&lt;/span&gt; &lt;span class="s2"&gt;"ap"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;file_system_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_efs_file_system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lambda_efs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;posix_user&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;uid&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
    &lt;span class="nx"&gt;gid&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;root_directory&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/lambda"&lt;/span&gt;
    &lt;span class="nx"&gt;creation_info&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;owner_gid&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
      &lt;span class="nx"&gt;owner_uid&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
      &lt;span class="nx"&gt;permissions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"755"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Explanation of Key Sections
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;aws_efs_file_system&lt;/code&gt;: Creates the shared storage.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;aws_efs_mount_target&lt;/code&gt;: Mounts the EFS to subnets in your VPC (required).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;aws_efs_access_point&lt;/code&gt;: Simplifies access, ensures POSIX permissions.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;aws_lambda_function&lt;/code&gt;: The Lambda is now “wired” to use &lt;code&gt;/mnt/efs&lt;/code&gt; as a real folder.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;vpc_config&lt;/code&gt;: Required, since Lambda with EFS &lt;strong&gt;must run inside a VPC&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Output &amp;amp; Testing
&lt;/h2&gt;

&lt;p&gt;If done correctly, your Lambda will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Write/read files to &lt;code&gt;/mnt/efs&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Persist across invocations&lt;/li&gt;
&lt;li&gt;Share data with other compute instances&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Test tip: Add a &lt;code&gt;print(os.listdir("/mnt/efs"))&lt;/code&gt; to verify it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pre-Checks Before Using Lambda + EFS
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Checklist Item&lt;/th&gt;
&lt;th&gt;Reason&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Must deploy Lambda &lt;strong&gt;inside a VPC&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Required for EFS access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ensure proper &lt;strong&gt;security groups&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;NFS (2049) must be open between Lambda and EFS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avoid &lt;strong&gt;cold start bloat&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Mounting adds latency; prefer provisioned concurrency for speed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set correct &lt;strong&gt;POSIX permissions&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Access Point must match your Lambda UID/GID&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When &lt;em&gt;Not&lt;/em&gt; to Use This Setup
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;If your function runs fine within &lt;code&gt;/tmp&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;If you're doing low-latency key-value access → use &lt;strong&gt;DynamoDB or ElastiCache&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;If you don’t want VPC complexity (adding NAT Gateway, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  CloudFormation Snippet: Mounting EFS to Lambda
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AWSTemplateFormatVersion: '2010-09-09'
Description: Attach Amazon EFS to AWS Lambda for shared persistent storage

Resources:

  MyVPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsSupport: true
      EnableDnsHostnames: true
      Tags:
        - Key: Name
          Value: LambdaVPC

  MySubnet1:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref MyVPC
      CidrBlock: 10.0.1.0/24
      AvailabilityZone: !Select [ 0, !GetAZs "" ]
      MapPublicIpOnLaunch: true

  MySecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Allow Lambda access to EFS
      VpcId: !Ref MyVPC
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 2049
          ToPort: 2049
          CidrIp: 0.0.0.0/0 # for public test only — restrict in prod

  MyEFS:
    Type: AWS::EFS::FileSystem
    Properties:
      Encrypted: true
      PerformanceMode: generalPurpose

  MyMountTarget:
    Type: AWS::EFS::MountTarget
    Properties:
      FileSystemId: !Ref MyEFS
      SubnetId: !Ref MySubnet1
      SecurityGroups:
        - !Ref MySecurityGroup

  MyAccessPoint:
    Type: AWS::EFS::AccessPoint
    Properties:
      FileSystemId: !Ref MyEFS
      PosixUser:
        Uid: "1000"
        Gid: "1000"
      RootDirectory:
        CreationInfo:
          OwnerUid: "1000"
          OwnerGid: "1000"
          Permissions: "750"
        Path: "/lambda"

  LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole
        - arn:aws:iam::aws:policy/AmazonElasticFileSystemClientReadWriteAccess

  MyLambdaFunction:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: MyEFSLambda
      Runtime: python3.12
      Handler: index.lambda_handler
      Code:
        ZipFile: |
          import os
          def lambda_handler(event, context):
              with open("/mnt/efs/test.txt", "w") as f:
                  f.write("Hello from Lambda using EFS!")
              return "File written to EFS"
      Role: !GetAtt LambdaExecutionRole.Arn
      VpcConfig:
        SubnetIds:
          - !Ref MySubnet1
        SecurityGroupIds:
          - !Ref MySecurityGroup
      FileSystemConfigs:
        - Arn: !GetAtt MyAccessPoint.Arn
          LocalMountPath: /mnt/efs

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Explaining the Lambda + EFS CloudFormation Stack (Simplified Professional View)
&lt;/h2&gt;

&lt;p&gt;This CloudFormation setup provisions a &lt;strong&gt;serverless Lambda function connected to Amazon EFS&lt;/strong&gt;, enabling persistent, shared storage across invocations. Here's what each component does and why it matters:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Amazon EFS (&lt;code&gt;AWS::EFS::FileSystem&lt;/code&gt;)&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Provides a &lt;strong&gt;durable, shared NFS file system&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Ideal for &lt;strong&gt;ML models, media processing, or large binaries&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Mount Target&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Enables EFS access from a &lt;strong&gt;Lambda in a VPC&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;One target per AZ is needed.&lt;/li&gt;
&lt;li&gt;Ensures &lt;strong&gt;private, secure VPC routing&lt;/strong&gt; to EFS.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Access Point&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Defines a safe mount path (e.g., &lt;code&gt;/lambda&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Sets POSIX permissions to prevent access issues.&lt;/li&gt;
&lt;li&gt;AWS-recommended for &lt;strong&gt;multi-function access&lt;/strong&gt; and permission control.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Security Groups&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Allows &lt;strong&gt;port 2049 (NFS)&lt;/strong&gt; from Lambda to EFS.&lt;/li&gt;
&lt;li&gt;Lambda’s SG is whitelisted by the EFS SG.&lt;/li&gt;
&lt;li&gt;Ensures &lt;strong&gt;secure, scoped connectivity&lt;/strong&gt; inside the VPC.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. &lt;strong&gt;IAM Role&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Grants &lt;strong&gt;VPC, EFS, and CloudWatch Logs&lt;/strong&gt; access.&lt;/li&gt;
&lt;li&gt;Follows least-privilege policy: only &lt;code&gt;elasticfilesystem:ClientMount&lt;/code&gt;, &lt;code&gt;logs:*&lt;/code&gt;, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. &lt;strong&gt;Lambda Function&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Connects to private subnets + security group.&lt;/li&gt;
&lt;li&gt;Mounts EFS at &lt;code&gt;/mnt/efs&lt;/code&gt; using the Access Point.&lt;/li&gt;
&lt;li&gt;Can &lt;strong&gt;read/write files across invocations&lt;/strong&gt; — perfect for ML, media, or temp file sharing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to Use:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;ML inference with large model files.&lt;/li&gt;
&lt;li&gt;Persistent shared data (e.g., thumbnails, binaries).&lt;/li&gt;
&lt;li&gt;Temporary storage exceeding &lt;code&gt;/tmp&lt;/code&gt;’s 512MB limit.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pre-checks:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Lambda must run in &lt;strong&gt;private subnets&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Ensure &lt;strong&gt;port 2049 is open&lt;/strong&gt; between SGs.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;Access Points&lt;/strong&gt; to avoid permission issues.&lt;/li&gt;
&lt;li&gt;Role must have correct &lt;strong&gt;EFS + VPC permissions&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Recommendations &amp;amp; Best Practices
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Use Lambda + EFS?&lt;/th&gt;
&lt;th&gt;Why?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ML inference with &amp;gt;100MB models&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Faster load time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sharing temp files between functions&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Persistent access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Simple file logging&lt;/td&gt;
&lt;td&gt;Use CloudWatch&lt;/td&gt;
&lt;td&gt;Cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time caching&lt;/td&gt;
&lt;td&gt;Use ElastiCache&lt;/td&gt;
&lt;td&gt;Lower latency&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Key Concepts and Terminology
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lambda&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Serverless compute function that runs code in response to events&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EFS (Elastic File System)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scalable, network-based file storage for AWS compute services&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;VPC (Virtual Private Cloud)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Isolated network environment for deploying AWS resources&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Access Point&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Named mount path with identity and permissions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Throughput Mode&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Controls how EFS scales I/O (Bursting or Provisioned)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;/tmp&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Lambda's 512MB ephemeral local storage, cleared after container reset&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Elasticache&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;In-memory key-value store (e.g., Redis) used for caching/shared data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cold Start&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Latency on first invocation due to provisioning resources&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt; &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-filesystem.html" rel="noopener noreferrer"&gt;AWS Lambda EFS Official Guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; &lt;a href="https://aws.amazon.com/efs/pricing/" rel="noopener noreferrer"&gt;Amazon EFS Pricing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; &lt;a href="https://aws.amazon.com/s3/pricing/" rel="noopener noreferrer"&gt;Amazon S3 Pricing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; &lt;a href="https://aws.amazon.com/elasticache/pricing/" rel="noopener noreferrer"&gt;ElastiCache Pricing&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; &lt;a href="https://registry.terraform.io/modules/terraform-aws-modules/lambda/aws/latest" rel="noopener noreferrer"&gt;AWS Terraform EFS + Lambda Example&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; &lt;a href="https://docs.aws.amazon.com/efs/latest/ug/performance.html" rel="noopener noreferrer"&gt;EFS Performance Tips&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; &lt;a href="https://docs.aws.amazon.com/efs/latest/ug/efs-access-points.html" rel="noopener noreferrer"&gt;Lambda + EFS Access Point Best Practices&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>cloudcomputing</category>
      <category>lambda</category>
    </item>
    <item>
      <title>Skip OS Shutdown on EC2: Instantly Stop or Terminate Instances with AWS CLI v2.15+</title>
      <dc:creator>Ismail Kovvuru</dc:creator>
      <pubDate>Sun, 27 Jul 2025 14:56:47 +0000</pubDate>
      <link>https://dev.to/ismailkovvuru/skip-os-shutdown-on-ec2-instantly-stop-or-terminate-instances-with-aws-cli-v215-2po9</link>
      <guid>https://dev.to/ismailkovvuru/skip-os-shutdown-on-ec2-instantly-stop-or-terminate-instances-with-aws-cli-v215-2po9</guid>
      <description>&lt;p&gt;In 2025, AWS introduced a powerful feature for Amazon EC2 that allows you to &lt;strong&gt;skip the operating system shutdown process&lt;/strong&gt; during &lt;code&gt;stop&lt;/code&gt; or &lt;code&gt;terminate&lt;/code&gt; operations. Using the &lt;code&gt;--skip-os-shutdown&lt;/code&gt; flag, you can immediately shut down or terminate an EC2 instance—without waiting for in-OS cleanup scripts, disk flushes, or graceful exits.&lt;/p&gt;

&lt;p&gt;This flag is a game-changer for &lt;strong&gt;DevOps pipelines, failover automation, blue-green deployments&lt;/strong&gt;, and &lt;strong&gt;ephemeral test environments&lt;/strong&gt; where &lt;strong&gt;speed takes priority over control&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is &lt;code&gt;--skip-os-shutdown&lt;/code&gt;?
&lt;/h2&gt;

&lt;p&gt;By default, EC2 sends a signal to the guest operating system to gracefully shut down when you stop or terminate an instance. This allows the OS to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run shutdown scripts&lt;/li&gt;
&lt;li&gt;Flush memory to disk&lt;/li&gt;
&lt;li&gt;Notify monitoring agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With &lt;code&gt;--skip-os-shutdown&lt;/code&gt;, this signal is &lt;strong&gt;bypassed&lt;/strong&gt;. The instance is instantly powered off or terminated, just like yanking the power cord from a physical server.&lt;/p&gt;

&lt;h3&gt;
  
  
  CLI Example:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws ec2 stop-instances &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--instance-ids&lt;/span&gt; i-1234567890abcdef0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--skip-os-shutdown&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Applies To:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;stop-instances&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;terminate-instances&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Available via:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS CLI (v2.15+)&lt;/li&gt;
&lt;li&gt;AWS Console&lt;/li&gt;
&lt;li&gt;SDKs (progressively being updated)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisite: AWS CLI v2.15 or Later
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;The &lt;code&gt;--skip-os-shutdown&lt;/code&gt; flag is only supported in &lt;strong&gt;AWS CLI version 2.15.0 and above&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Check Your Version:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;span class="c"&gt;# Should return: aws-cli/2.15.0 or newer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How to Upgrade:
&lt;/h3&gt;

&lt;h4&gt;
  
  
  macOS/Linux:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s2"&gt;"https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="s2"&gt;"awscliv2.zip"&lt;/span&gt;
unzip awscliv2.zip
&lt;span class="nb"&gt;sudo&lt;/span&gt; ./aws/install &lt;span class="nt"&gt;--update&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Windows:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Download: &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-windows.html" rel="noopener noreferrer"&gt;AWS CLI v2 MSI Installer&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Best Use Cases for This Flag
&lt;/h2&gt;

&lt;p&gt;This feature is ideal when &lt;strong&gt;fast instance termination or shutdown&lt;/strong&gt; is required and you’re okay with skipping cleanup steps:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Why It’s Useful&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;High Availability (HA)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Rapidly remove and replace unhealthy EC2s during failover.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Blue-Green Deployments&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Quickly decommission old environments.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CI/CD Test Runners&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Instantly clean up short-lived EC2s after test jobs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Spot Instances&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Avoid delays in auto-replacement workflows.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chaos Engineering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Force fail nodes to test system resilience.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Situations &amp;amp; Tools Where You &lt;em&gt;Should Not&lt;/em&gt; Use It
&lt;/h2&gt;

&lt;p&gt;Using &lt;code&gt;--skip-os-shutdown&lt;/code&gt; bypasses critical OS-level processes. Here’s a breakdown of where this could cause &lt;strong&gt;problems&lt;/strong&gt;:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Stateful Applications / Databases&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Why Not&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MySQL, PostgreSQL, MongoDB, Redis&lt;/td&gt;
&lt;td&gt;May lose in-memory or unflushed data; corrupt journals.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elasticsearch, Kafka&lt;/td&gt;
&lt;td&gt;Disrupts cluster state or causes shard inconsistency.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;tmpfs&lt;/code&gt; / RAM-backed processes&lt;/td&gt;
&lt;td&gt;Data is lost immediately.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;EC2 Lifecycle Tools &amp;amp; Shutdown Hooks&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Auto Scaling Lifecycle Hooks&lt;/td&gt;
&lt;td&gt;Terminating hook (&lt;code&gt;EC2_INSTANCE_TERMINATING&lt;/code&gt;) may never trigger.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpsWorks / Elastic Beanstalk&lt;/td&gt;
&lt;td&gt;Skips teardown, logs, and state tracking.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom AMIs with shutdown scripts&lt;/td&gt;
&lt;td&gt;Scripts for cleanup or final logging won’t run.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;CI/CD Agents &amp;amp; Test Frameworks&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Jenkins EC2 Agents&lt;/td&gt;
&lt;td&gt;Results/logs not archived, jobs may break.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Actions (self-hosted)&lt;/td&gt;
&lt;td&gt;Workspace cleanup skipped.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CodeDeploy&lt;/td&gt;
&lt;td&gt;Lifecycle events like &lt;code&gt;BeforeBlockTraffic&lt;/code&gt; skipped.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Monitoring &amp;amp; Security Systems&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch Agent, Datadog, New Relic&lt;/td&gt;
&lt;td&gt;Final logs/metrics may not be sent.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GuardDuty, OSQuery, Falco&lt;/td&gt;
&lt;td&gt;Missed signals, incomplete audits.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SOC2 / ISO-certified environments&lt;/td&gt;
&lt;td&gt;Could breach audit policies requiring graceful shutdowns.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  5. &lt;strong&gt;EC2 Features Requiring Shutdown&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;EC2 Hibernate&lt;/td&gt;
&lt;td&gt;Hibernate state won’t be saved.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AMI Creation&lt;/td&gt;
&lt;td&gt;Image may be inconsistent or dirty.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch Alarms&lt;/td&gt;
&lt;td&gt;May falsely trigger due to skipped signal.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto Recovery&lt;/td&gt;
&lt;td&gt;May misinterpret health check failures.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Behind the Scenes: What Happens Internally?
&lt;/h2&gt;

&lt;p&gt;When &lt;code&gt;--skip-os-shutdown&lt;/code&gt; is used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No ACPI signal&lt;/strong&gt; is sent to the OS&lt;/li&gt;
&lt;li&gt;AWS forcibly stops the instance at the hypervisor level&lt;/li&gt;
&lt;li&gt;RAM is purged&lt;/li&gt;
&lt;li&gt;OS cleanup or shutdown logic is entirely bypassed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s essentially a &lt;strong&gt;hard power-off&lt;/strong&gt;, not a shutdown.&lt;/p&gt;

&lt;h2&gt;
  
  
  EBS Volume Considerations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Attached EBS volumes &lt;strong&gt;remain intact&lt;/strong&gt;, but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Applications with delayed writes may leave &lt;strong&gt;incomplete data&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;File systems not mounted with &lt;code&gt;sync&lt;/code&gt; or not journaled may be &lt;strong&gt;inconsistent&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Use &lt;code&gt;sync&lt;/code&gt;, &lt;code&gt;fsync()&lt;/code&gt;, or journaling file systems to minimize risk.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Monitoring Caveats
&lt;/h2&gt;

&lt;p&gt;Skipping shutdown can confuse your observability stack:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch Metrics&lt;/td&gt;
&lt;td&gt;May report inaccurate CPU/memory usage.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Datadog&lt;/td&gt;
&lt;td&gt;Final flush of metrics/logs skipped.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prometheus&lt;/td&gt;
&lt;td&gt;Node exporter may not unregister cleanly.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Recommendation&lt;/strong&gt;:&lt;br&gt;
Use &lt;strong&gt;EventBridge rules&lt;/strong&gt; to trigger compensating actions after termination.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced Workflow Example: Blue-Green Deployment
&lt;/h2&gt;

&lt;p&gt;Here’s how &lt;code&gt;--skip-os-shutdown&lt;/code&gt; fits into a zero-downtime deploy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Deploy new version (Green) → health check passes&lt;/li&gt;
&lt;li&gt;Route traffic to Green&lt;/li&gt;
&lt;li&gt;Drain and disable Blue&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;--skip-os-shutdown&lt;/code&gt; to instantly remove Blue&lt;/li&gt;
&lt;li&gt;Trigger cleanup Lambda via CloudWatch/EventBridge&lt;/li&gt;
&lt;li&gt;Free up EBS/ENI/IP and complete deploy&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Summary Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Best For&lt;/td&gt;
&lt;td&gt;Spot instances, failover systems, fast teardown&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avoid In&lt;/td&gt;
&lt;td&gt;Databases, CI/CD runners, audit-compliant systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;What’s Skipped&lt;/td&gt;
&lt;td&gt;Shutdown scripts, disk flush, monitoring agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLI Requirement&lt;/td&gt;
&lt;td&gt;AWS CLI v2.15.0+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EBS Risk&lt;/td&gt;
&lt;td&gt;Data may be inconsistent without flushing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Not For&lt;/td&gt;
&lt;td&gt;Hibernate, AMI creation, critical shutdown processes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Thoughts
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;--skip-os-shutdown&lt;/code&gt; is a powerful flag that prioritizes &lt;strong&gt;speed over safety&lt;/strong&gt;. Use it in &lt;strong&gt;automated, stateless environments&lt;/strong&gt;, but &lt;strong&gt;avoid it anywhere state, compliance, or graceful teardown matters&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Think of it as a tool in your belt—not a default behavior.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_StopInstances.html" rel="noopener noreferrer"&gt;AWS EC2 StopInstances API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/hashicorp/terraform-provider-aws/issues" rel="noopener noreferrer"&gt;Terraform GitHub Discussion for Support&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Related Blogs:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://redsignals.beehiiv.com/p/mastering-amazon-eks-upgrades-the-ultimate-senior-level-guide?utm_source=redsignals.beehiiv.com&amp;amp;utm_medium=newsletter&amp;amp;utm_campaign=kubelet-restart-in-aws-eks-causes-logs-fixes-node-stability-guide-2025&amp;amp;_bhlid=666dbd6e5f48537e68313a2ef75d8a064f84f565" rel="noopener noreferrer"&gt;Mastering Amazon EKS Upgrades: The Ultimate Senior-Level Guide&lt;/a&gt;
2.&lt;a href="https://redsignals.beehiiv.com/p/crashloopbackoff-with-no-logs-fix-guide-for-kubernetes-with-yaml-ci-cd?utm_source=redsignals.beehiiv.com&amp;amp;utm_medium=newsletter&amp;amp;utm_campaign=kubelet-restart-in-aws-eks-causes-logs-fixes-node-stability-guide-2025&amp;amp;_bhlid=e2cb5681c30fb120c494b9ca1f280e394379a46b" rel="noopener noreferrer"&gt; CrashLoopBackOff with No Logs - Fix Guide for Kubernetes with YAML &amp;amp; CI/CD &lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://redsignals.beehiiv.com/p/multi-tenancy-in-amazon-eks-secure-scalable-kubernetes-isolation-with-quotas-observability-dr?utm_source=redsignals.beehiiv.com&amp;amp;utm_medium=newsletter&amp;amp;utm_campaign=kubelet-restart-in-aws-eks-causes-logs-fixes-node-stability-guide-2025&amp;amp;_bhlid=99ef931134046c40d7dc6908b21af6110545d48a" rel="noopener noreferrer"&gt;Multi-Tenancy in Amazon EKS: Secure, Scalable Kubernetes Isolation with Quotas, Observability &amp;amp; DR &lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dubniumlabs.blogspot.com/2025/07/10-proven-kubectl-commands-ultimate.html?utm_source=redsignals.beehiiv.com&amp;amp;utm_medium=newsletter&amp;amp;utm_campaign=kubelet-restart-in-aws-eks-causes-logs-fixes-node-stability-guide-2025&amp;amp;_bhlid=a2fb8ef1ea88c0f1c355e674f2e9ca2425abcbdd" rel="noopener noreferrer"&gt;10 Proven kubectl Commands: The Ultimate 2025 AWS Kubernetes Guide&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/ismailkovvuru/one-container-per-pod-kubernetes-done-right-g5c?utm_source=redsignals.beehiiv.com&amp;amp;utm_medium=newsletter&amp;amp;utm_campaign=kubelet-restart-in-aws-eks-causes-logs-fixes-node-stability-guide-2025&amp;amp;_bhlid=39941c1c8e3dc29b0588d6f7c84df2af4f4f0c8a"&gt;One Container per Pod: Kubernetes Done Right&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://redsignals.beehiiv.com/p/why-kubernetes-cluster-autoscaler-fails-fixes-logs-yaml-inside?utm_source=redsignals.beehiiv.com&amp;amp;utm_medium=newsletter&amp;amp;utm_campaign=kubelet-restart-in-aws-eks-causes-logs-fixes-node-stability-guide-2025&amp;amp;_bhlid=48e46711b877915601fc571417d7ce99571e9b7d" rel="noopener noreferrer"&gt;Why Kubernetes Cluster Autoscaler Fails ? Fixes, Logs &amp;amp; YAML Inside&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ismailkovvuru.hashnode.dev/ansible-inventory-guide-2025-choosing-between-static-and-dynamic-for-aws?utm_source=redsignals.beehiiv.com&amp;amp;utm_medium=newsletter&amp;amp;utm_campaign=kubelet-restart-in-aws-eks-causes-logs-fixes-node-stability-guide-2025&amp;amp;_bhlid=7fac456122bf3a9171eb94c0c58b815233558f5c" rel="noopener noreferrer"&gt;Ansible Inventory Guide 2025&lt;/a&gt; &lt;/li&gt;
&lt;li&gt;&lt;a href="https://ismailkovvuru.hashnode.dev/devops-without-observability-is-a-disaster-waiting-to-happen?utm_source=redsignals.beehiiv.com&amp;amp;utm_medium=newsletter&amp;amp;utm_campaign=kubelet-restart-in-aws-eks-causes-logs-fixes-node-stability-guide-2025&amp;amp;_bhlid=b1110d825afdab4cd940fe6f35a3ba42e14d2679" rel="noopener noreferrer"&gt;DevOps without Observability&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For more topics visit &lt;a href="https://medium.com/@ismailkovvuru" rel="noopener noreferrer"&gt;Medium&lt;/a&gt; , &lt;a href="https://redsignals.beehiiv.com/subscribe" rel="noopener noreferrer"&gt;Red Signals&lt;/a&gt; and &lt;a href="https://dubniumlabs.blogspot.com/?utm_source=redsignals.beehiiv.com&amp;amp;utm_medium=newsletter&amp;amp;utm_campaign=kubelet-restart-in-aws-eks-causes-logs-fixes-node-stability-guide-2025&amp;amp;_bhlid=f52a3b48146b6d92fd7365510ef46fb42486d7f0" rel="noopener noreferrer"&gt;Dubniumlabs&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>discuss</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>AWS EC2 Placement Groups Explained (2025): High Availability, Cluster, Spread &amp; Partition with Real-World Automation</title>
      <dc:creator>Ismail Kovvuru</dc:creator>
      <pubDate>Fri, 18 Jul 2025 06:57:14 +0000</pubDate>
      <link>https://dev.to/ismailkovvuru/aws-ec2-placement-groups-explained-2025-high-availability-cluster-spread-partition-with-4b0b</link>
      <guid>https://dev.to/ismailkovvuru/aws-ec2-placement-groups-explained-2025-high-availability-cluster-spread-partition-with-4b0b</guid>
      <description>&lt;p&gt;Learn to master AWS EC2 Placement Groups in 2025: Cluster, Spread, and Partition. When to use them, real scenarios, Terraform, CloudFormation &amp;amp; CLI scripts, capacity pitfalls, practical HA design, and troubleshooting.&lt;/p&gt;

&lt;p&gt;Many AWS engineers never touch Placement Groups — and then wonder why their HPC jobs fail to launch or why a single rack outage kills their entire Kafka cluster.&lt;/p&gt;

&lt;p&gt;EC2 Placement Groups are one of AWS’s least understood yet most powerful tools for designing true rack-level High Availability, low-latency HPC, and fault-domain isolation at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is an EC2 Placement Group?
&lt;/h2&gt;

&lt;p&gt;By default, when you launch EC2s, AWS places them where it best fits capacity &amp;amp; resiliency needs.&lt;/p&gt;

&lt;p&gt;A Placement Group (PG) lets you override that default — so you can:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pack nodes together for max speed&lt;/li&gt;
&lt;li&gt;Spread them apart for fault isolation&lt;/li&gt;
&lt;li&gt;Split them into clear failure domains&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The 3 Placement Group Types
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Type          | Best For                   | How It Works                                                     | Limits                                             | When to Use                                                      |
| ------------- | -------------------------- | ---------------------------------------------------------------- | -------------------------------------------------- | ---------------------------------------------------------------- |
|   Cluster     | HPC, ML, GPU               | Puts all nodes on same rack / rack set                           | Single AZ, practical limit is available hardware   | When you need ultra-low latency, high throughput                 |
|   Spread      | Small HA nodes             | Guarantees each node is on separate rack                         | 7 per AZ, can span AZs                             | When 3–7 nodes must survive single rack failure                  |
|   Partition   | Large distributed clusters | Divides nodes into partitions → racks → explicit failure domains | Up to 7 partitions per AZ, each can hold many EC2s | When you need explicit rack-level fault domains for big clusters |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Cluster Placement Group — What It Is ?&lt;/strong&gt;&lt;br&gt;
A Cluster Placement Group tries to pack your EC2 instances as close together as possible, usually in the same rack, to minimize latency and maximize throughput.&lt;/p&gt;

&lt;p&gt;Used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HPC jobs&lt;/li&gt;
&lt;li&gt;Distributed ML training&lt;/li&gt;
&lt;li&gt;Big data pipelines needing tight east-west traffic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why not always use it?&lt;br&gt;
If the rack runs out of slots, your launch fails — so you must design for capacity.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Always pair Cluster PG with Capacity Reservations for predictable launches.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-----------------------------+
|      Cluster PG (AZ-a)      |
+-----------------------------+
|                             |
| [ Rack A ]                  |
| -------------------------   |
| EC2-1   EC2-2   EC2-3       |
| EC2-4   EC2-5   EC2-6       |
|                             |
+-----------------------------+

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Spread Placement Group — What It Is&lt;/strong&gt;&lt;br&gt;
A Spread Placement Group places each EC2 on a completely separate rack with separate hardware, power, and network.&lt;/p&gt;

&lt;p&gt;Used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small sets of must-not-fail nodes&lt;/li&gt;
&lt;li&gt;HA quorum services (web servers, payment front ends, DNS)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Limit: Only 7 per AZ — so multi-AZ Spread PGs are common.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-----------------------------------------------+
|               Spread PG (AZ-a)                |
+-----------------------------------------------+
|                                               |
| [ Rack A ]   EC2-1                            |
| [ Rack B ]   EC2-2                            |
| [ Rack C ]   EC2-3                            |
|                                               |
+-----------------------------------------------+

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Partition Placement Group — What It Is&lt;/strong&gt;&lt;br&gt;
A Partition PG splits your cluster into partitions → mapped to rack sets → each acts as a fault domain.&lt;/p&gt;

&lt;p&gt;Used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Large clusters like Kafka, Cassandra, Hadoop&lt;/li&gt;
&lt;li&gt;If Rack A fails, only Partition 1 is affected → cluster keeps running
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+-----------------------------+
|  Partition Placement Group  |
+-----------------------------+
|                             |
| Partition 1 → Rack A        |
| -------------------------   |
| EC2-1   EC2-2   EC2-3       |
|                             |
| Partition 2 → Rack B        |
| -------------------------   |
| EC2-4   EC2-5   EC2-6       |
|                             |
| Partition 3 → Rack C        |
| -------------------------   |
| EC2-7   EC2-8   EC2-9       |
+-----------------------------+

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  How Many EC2s? — Actual Limits
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Type          | Max per AZ                                     | Multi-AZ?                     | Notes                               |
| ------------- | ---------------------------------------------- | ----------------------------- | ----------------------------------- |
| 1.Cluster     | No strict limit, practical capacity limit only |   One AZ only                 | Constrained by available rack slots |
| 2.Spread      | 7 per AZ                                       |   Create one Spread PG per AZ | Each instance on different rack     |
| 3.Partition   | 7 partitions per AZ (each can hold 100s)       |   One AZ only                 | Must replicate across AZs yourself  |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Why These Limits Exist
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Type          | Limit               | Why?                            | If Exceeded                          |
| ------------- | ------------------- | ------------------------------- | ------------------------------------ |
| 1.Cluster     | Hardware capacity   | All nodes must fit on same rack | `InsufficientInstanceCapacity` error |
| 2.Spread      | 7 per AZ            | AWS guarantees unique racks     | 8th instance won’t launch            |
| 3.Partition   | 7 partitions per AZ | 7 unique rack sets per AZ       | Error if you request 8+ partitions   |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;These limits come from real hardware constraints — not arbitrary. So you must design for them.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Real Error Snippet
&lt;/h2&gt;

&lt;p&gt;Trying to launch a big Cluster PG without capacity?&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;An error occurred (InsufficientInstanceCapacity) when calling the RunInstances operation:
We currently do not have sufficient m5.4xlarge capacity in the Availability Zone you requested.
Our system will be working on provisioning additional capacity.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;How to fix:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Try this one later&lt;/li&gt;
&lt;li&gt; Switch AZ&lt;/li&gt;
&lt;li&gt; Use Capacity Reservation:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws ec2 create-capacity-reservation \
  --instance-type m5.4xlarge \
  --instance-count 4 \
  --availability-zone us-east-1a

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Terraform, CLI &amp;amp; CloudFormation — Explained
&lt;/h2&gt;

&lt;p&gt;Terraform — Spread PG&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Define a Placement Group resource in AWS using Terraform
resource "aws_placement_group" "spread_pg" {
  name     = "my-spread-pg"   # Human-friendly name for this PG
  strategy = "spread"         # Placement strategy: Spread means each instance on a separate rack
}

# Launch an EC2 instance attached to the Spread Placement Group
resource "aws_instance" "web_node" {
  ami             = "ami-xxxxxxx"                 # Replace with your AMI ID (e.g., Amazon Linux)
  instance_type   = "t3.micro"                    # Choose your desired instance type
  placement_group = aws_placement_group.spread_pg.name  # Attach to the defined Spread PG
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;How to use&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Save as main.tf&lt;/li&gt;
&lt;li&gt;Run:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;terraform init   # Initialize Terraform
terraform plan   # See what will be created
terraform apply  # Create the PG and launch your instance

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;AWS CLI — Spread PG&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Create a Spread Placement Group named "my-spread-pg"
aws ec2 create-placement-group \
  --group-name my-spread-pg \
  --strategy spread

# Launch an EC2 instance into the Spread Placement Group
aws ec2 run-instances \
  --image-id ami-xxxxxxx \    # Replace with your AMI ID
  --instance-type t3.micro \  # Choose your instance type
  --placement GroupName=my-spread-pg   # Attach to your Spread PG

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;How to use&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Replace ami-xxxxxxx with a real AMI ID (e.g., Amazon Linux 2).&lt;/li&gt;
&lt;li&gt;Run each command in your terminal (must be authenticated with aws configure).
CloudFormation — Spread PG
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Resources:
  # Define the Placement Group with Spread strategy
  MyPlacementGroup:
    Type: AWS::EC2::PlacementGroup
    Properties:
      Strategy: spread

  # Launch an EC2 instance inside the Placement Group
  MyInstance:
    Type: AWS::EC2::Instance
    Properties:
      ImageId: ami-xxxxxxx      # Replace with your AMI ID
      InstanceType: t3.micro    # Your desired instance type
      PlacementGroupName: !Ref MyPlacementGroup  # Reference the defined Spread PG

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;How to use:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Save as spread-pg.yaml&lt;/li&gt;
&lt;li&gt;Deploy with:
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws cloudformation create-stack \
  --stack-name my-spread-stack \
  --template-body file://spread-pg.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Tips&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Always double-check your AMI ID — wrong region = launch failure.&lt;/li&gt;
&lt;li&gt;Use terraform destroy to clean up your test infra.&lt;/li&gt;
&lt;li&gt;Tag your Placement Groups (tags block in Terraform) for cost tracking.&lt;/li&gt;
&lt;li&gt;Combine with an Auto Scaling Group if you need elasticity with rack-level isolation.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  AWS Terms Cheat Sheet
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Term                 | What It Means                  |
| -------------------- | ------------------------------ |
| Placement Group      | Logical rack placement control |
| Cluster PG           | Same rack                      |
| Spread PG            | Separate racks                 |
| Partition PG         | Fault-domain racks             |
| Capacity Reservation | Guarantees physical slots      |
| AZ                   | Availability Zone              |
| ASG                  | Auto Scaling Group             |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Updated — Multi-AZ Placement Groups (2025)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Cluster PG:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Not multi-AZ capable.&lt;/strong&gt; A &lt;strong&gt;Cluster Placement Group&lt;/strong&gt; must keep all instances physically close on the &lt;em&gt;same rack or rack set&lt;/em&gt; for ultra-low latency — this is only possible &lt;strong&gt;inside a single Availability Zone&lt;/strong&gt;. Cross-AZ networking would destroy its main benefit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Partition PG:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Not multi-AZ capable.&lt;/strong&gt; A &lt;strong&gt;Partition Placement Group&lt;/strong&gt; logically maps each partition to a unique rack set within &lt;strong&gt;one AZ&lt;/strong&gt; to create explicit failure domains. AWS does not support Partition PGs that span multiple AZs. To achieve multi-AZ fault tolerance, deploy &lt;strong&gt;separate Partition PGs in each AZ&lt;/strong&gt; and handle replication at the application layer (e.g., Kafka multi-AZ topic replication).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Spread PG:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Multi-AZ capable.&lt;/strong&gt; Unlike Cluster or Partition, a &lt;strong&gt;Spread Placement Group&lt;/strong&gt; can be deployed in &lt;strong&gt;multiple AZs&lt;/strong&gt; by creating &lt;strong&gt;one Spread PG per AZ&lt;/strong&gt;. Each guarantees that its instances are placed on separate racks in that specific AZ. Combining multiple Spread PGs across AZs gives you both &lt;strong&gt;rack-level failure isolation&lt;/strong&gt; and &lt;strong&gt;AZ-level disaster recovery&lt;/strong&gt; for small, critical node sets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best Practice:&lt;/strong&gt; When using Spread PG for multi-AZ HA, plan separate subnets, separate Spread PGs, and properly configure cross-AZ load balancing (e.g., ALB or NLB) to keep traffic healthy if a rack or AZ fails.&lt;/p&gt;
&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Designing high-performance, fault-tolerant EC2 workloads is far more than just choosing the right instance type — it’s also about controlling where your instances run physically.&lt;/p&gt;

&lt;p&gt;Placement Groups (Cluster, Spread, and Partition) are powerful tools for advanced engineers who truly need:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ultra-low network latency (Cluster)&lt;/li&gt;
&lt;li&gt;Rack-level fault tolerance for critical nodes (Spread)&lt;/li&gt;
&lt;li&gt;Explicit fault domains for large clusters (Partition)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Are they recommended for every workload?&lt;/strong&gt;&lt;br&gt;
No. For most small or generic auto-scaling workloads, AWS’s default placement is good enough — using Placement Groups when you don’t need them can add unnecessary complexity and failure points.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When are they strongly recommended?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When you run HPC, distributed GPU, or tight MPI jobs needing rack-level speed (Cluster)&lt;/li&gt;
&lt;li&gt;When you must guarantee that no single rack outage takes down all quorum nodes (Spread)&lt;/li&gt;
&lt;li&gt;When you design large, stateful clusters that need explicit failure domains for fast recovery (Partition)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What to keep in mind before using them:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Always verify AZ capacity. Cluster PGs commonly hit Insufficient Instance Capacity if you request large node counts.&lt;/li&gt;
&lt;li&gt;Combine Cluster PGs with Capacity Reservations for mission-critical HPC.&lt;/li&gt;
&lt;li&gt;Design Spread PGs for multi-AZ from day one — never assume one AZ’s 7-instance limit will be enough for future scale.&lt;/li&gt;
&lt;li&gt;Partition PGs do not magically replicate across AZs — you must handle multi-AZ replication at the app level.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Placement Groups work best when you deeply understand the physical infra behind AWS’s virtual promise. Used right, they unlock HA and throughput levels you can’t get from default placement alone. Used wrong, they cause launch failures or false sense of redundancy.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  References — AWS Official Docs
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html" rel="noopener noreferrer"&gt;AWS Placement Groups Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-reservations.html" rel="noopener noreferrer"&gt;AWS EC2 Capacity Reservations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/hpc/" rel="noopener noreferrer"&gt;AWS HPC Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/reliability-pillar.html" rel="noopener noreferrer"&gt;AWS Well-Architected Framework — Reliability Pillar&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  You will find these topics useful
&lt;/h2&gt;


&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://dubniumlabs.blogspot.com/2025/07/aws-bedrock-vs-sagemaker-jumpstart.html" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogger.googleusercontent.com%2Fimg%2Fb%2FR29vZ2xl%2FAVvXsEhA7AgaRMGT9Xux8kvXSEkLaH2t6s1KgdlB8Q3yp_Ii3wX2e-PD0iRxHnwAYJxIWo-IZM-GNfcG2Ptz8Xv7qpLlsj2fpFC3zk0VC3ize2-yMYLUOqGJEJsNAW052RoHgnJBDuykuir8SRcKsYaL3EwrI7wLuyRd98SaXx-EGKN0tNUGQqvMzAALu-KCSQU%2Fw1200-h630-p-k-no-nu%2FAWS%2520Bedrock%2520vs%2520SageMaker.png" height="auto" class="m-0"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://dubniumlabs.blogspot.com/2025/07/aws-bedrock-vs-sagemaker-jumpstart.html" rel="noopener noreferrer" class="c-link"&gt;
            AWS Bedrock vs SageMaker JumpStart: Which One to Use for Your GenAI Use Case?
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            AWS Bedrock or SageMaker JumpStart? Deep-dive comparison for GenAI projects. Use cases, costs, and performance insights explained clearly.
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdubniumlabs.blogspot.com%2Ffavicon.ico"&gt;
          dubniumlabs.blogspot.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;



&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://dubniumlabs.blogspot.com/2025/07/10-proven-kubectl-commands-ultimate.html" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblogger.googleusercontent.com%2Fimg%2Fb%2FR29vZ2xl%2FAVvXsEi6QEkuFCmm4x_AUaGxT-TLvuwhYZlzsYlV9zcBiFjskpKEs6AMI7bZpPlPoCq0gW_TsjGbJMBg-fflcKVyCpIF4TdTXLzI2GX8FAPlK9Vqf07WDVMRSUmYvlYE1z53zyHcnUmjfXsFGct4ZjkMVIKCRmvOIh4ESYT_lSSeAV010fvSjOEjzwXTfG2tZO8%2Fw1200-h630-p-k-no-nu%2FThe%2520Ultimate%25202025%2520AWS%2520Kubernetes%2520Guide.png" height="auto" class="m-0"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://dubniumlabs.blogspot.com/2025/07/10-proven-kubectl-commands-ultimate.html" rel="noopener noreferrer" class="c-link"&gt;
            10 Proven kubectl Commands: The Ultimate 2025 AWS Kubernetes Guide
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            10 proven kubectl commands with examples for AWS DevOps in 2025. Master Kubernetes clusters, pods, deployments, services, and ultimate hands-on.
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdubniumlabs.blogspot.com%2Ffavicon.ico"&gt;
          dubniumlabs.blogspot.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;



</description>
      <category>aws</category>
      <category>devops</category>
      <category>discuss</category>
      <category>terraform</category>
    </item>
    <item>
      <title>Kubernetes Autoscaling Fails Silently. Here's Why and How to Fix It</title>
      <dc:creator>Ismail Kovvuru</dc:creator>
      <pubDate>Sun, 06 Jul 2025 08:39:51 +0000</pubDate>
      <link>https://dev.to/ismailkovvuru/kubernetes-autoscaling-fails-silently-heres-why-and-how-to-fix-it-34h8</link>
      <guid>https://dev.to/ismailkovvuru/kubernetes-autoscaling-fails-silently-heres-why-and-how-to-fix-it-34h8</guid>
      <description>&lt;h2&gt;
  
  
  Why Your Kubernetes Cluster Autoscaler Isn’t Scaling?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Your pods are pending. The nodes aren’t scaling. Logs say nothing.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sound familiar? You’re not alone.&lt;/p&gt;

&lt;p&gt;Most Kubernetes engineers hit this silently frustrating issue. And the truth is, the autoscaler isn’t broken. It’s just &lt;strong&gt;misunderstood&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here’s a sneak peek into what’s really going on:&lt;/p&gt;

&lt;h3&gt;
  
  
  What You’ll Learn in This Breakdown:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Why Cluster Autoscaler &lt;strong&gt;ignores&lt;/strong&gt; some pods (even if they're pending)&lt;/li&gt;
&lt;li&gt;How &lt;code&gt;nodeSelector&lt;/code&gt;, &lt;code&gt;taints&lt;/code&gt;, and affinity rules silently block scaling&lt;/li&gt;
&lt;li&gt;What real Cluster Autoscaler logs actually mean&lt;/li&gt;
&lt;li&gt;The hidden impact of PDBs, PVC zones, and priority classes&lt;/li&gt;
&lt;li&gt;YAML: Before &amp;amp; After examples that fix scaling issues instantly&lt;/li&gt;
&lt;li&gt;Terraform ASG configs for autoscaler to work properly&lt;/li&gt;
&lt;li&gt;Observability patterns + self-healing strategies (Kyverno, alerts, CI/CD)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Here’s a Sample Fix:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Bad Pod Spec (won’t scale):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
&lt;span class="na"&gt;nodeSelector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;instance-type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpu&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fixed YAML (scales properly):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;requests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;250m"&lt;/span&gt;
    &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;512Mi"&lt;/span&gt;
&lt;span class="na"&gt;tolerations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app-node"&lt;/span&gt;
    &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Equal"&lt;/span&gt;
    &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
    &lt;span class="na"&gt;effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NoSchedule"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Want the Full Breakdown?
&lt;/h3&gt;

&lt;p&gt;I’ve published the &lt;strong&gt;entire guide&lt;/strong&gt;, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real autoscaler logs&lt;/li&gt;
&lt;li&gt;Terraform IAM &amp;amp; ASG configs&lt;/li&gt;
&lt;li&gt;YAML validation checks&lt;/li&gt;
&lt;li&gt;Edge case scenarios no one talks about&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 &lt;strong&gt;Read the full post here on RedSignals:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;&lt;a href="https://redsignals.beehiiv.com/p/why-kubernetes-cluster-autoscaler-fails-fixes-logs-yaml-inside" rel="noopener noreferrer"&gt;Why Kubernetes Cluster Autoscaler Fails — Fixes, Logs &amp;amp; YAML Inside&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>aws</category>
      <category>devops</category>
      <category>discuss</category>
    </item>
    <item>
      <title>AWS Compute Wars 2025: EC2 vs Lambda vs Fargate vs EKS</title>
      <dc:creator>Ismail Kovvuru</dc:creator>
      <pubDate>Wed, 25 Jun 2025 10:58:12 +0000</pubDate>
      <link>https://dev.to/ismailkovvuru/aws-compute-wars-2025-ec2-vs-lambda-vs-fargate-vs-eks-4ici</link>
      <guid>https://dev.to/ismailkovvuru/aws-compute-wars-2025-ec2-vs-lambda-vs-fargate-vs-eks-4ici</guid>
      <description>&lt;p&gt;In 2025, choosing the right AWS compute service isn't just about knowing the features — it's about making the &lt;em&gt;right decision&lt;/em&gt; for your workload, budget, and team.&lt;/p&gt;

&lt;p&gt;As an AWS Engineer or DevOps Architect, you’ve likely asked yourself:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Should I go with Lambda or ECS?”&lt;br&gt;&lt;br&gt;
“Is EC2 still relevant in 2025?”&lt;br&gt;&lt;br&gt;
“Do we really need to adopt EKS?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I wrote a detailed blog post exploring these questions — backed by real-world experience, architecture patterns, and cost-performance tradeoffs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you’ll learn in the post:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When to choose EC2, Lambda, ECS, or EKS&lt;/li&gt;
&lt;li&gt;Cost vs complexity vs control breakdown&lt;/li&gt;
&lt;li&gt;Real use cases from cloud-native and hybrid teams&lt;/li&gt;
&lt;li&gt;An architecture decision matrix you can actually use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whether you’re designing new systems or modernizing legacy ones, this guide will help you make confident compute decisions in 2025 and beyond.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;Read the full post on Medium:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://medium.com/@ismailkovvuru/aws-compute-services-in-2025-a-aws-engineers-guide-to-real-world-architecture-decisions-2f23f26524b2" rel="noopener noreferrer"&gt;https://medium.com/@ismailkovvuru/aws-compute-services-in-2025-a-aws-engineers-guide-to-real-world-architecture-decisions-2f23f26524b2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I'd love your thoughts:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Which AWS compute service are you using in 2025 — and why?&lt;/p&gt;

&lt;p&gt;Drop your stack or experience in the comments 👇&lt;br&gt;&lt;br&gt;
Let’s talk DevOps architecture!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>cloud</category>
      <category>discuss</category>
    </item>
    <item>
      <title>One Container per Pod: Kubernetes Done Right</title>
      <dc:creator>Ismail Kovvuru</dc:creator>
      <pubDate>Sun, 22 Jun 2025 13:29:16 +0000</pubDate>
      <link>https://dev.to/ismailkovvuru/one-container-per-pod-kubernetes-done-right-g5c</link>
      <guid>https://dev.to/ismailkovvuru/one-container-per-pod-kubernetes-done-right-g5c</guid>
      <description>&lt;p&gt;Learn why running one container per pod is a Kubernetes best practice. Explore real-world fintech use cases, security benefits, and scaling advantages.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is a Pod in Kubernetes?
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Definition:&lt;/strong&gt;&lt;br&gt;
A Pod is the smallest deployable unit in Kubernetes. It represents a single instance of a running process in your cluster.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zl6hac3f3sfv494hzk9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0zl6hac3f3sfv494hzk9.png" alt="Kubernetes Pod" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A pod:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can host one or more containers&lt;/li&gt;
&lt;li&gt;Shares the same network namespace and storage volumes among all its containers&lt;/li&gt;
&lt;li&gt;Is ephemeral — meant to be created, run, and replaced automatically when needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Principle: “One Container per Pod”
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Definition:&lt;/strong&gt;&lt;br&gt;
While Kubernetes supports multiple containers in one pod, the best practice is to run only one container per pod — the single responsibility principle applied at the pod level.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This principle makes each pod act like a microservice unit, cleanly isolated, focused, and independently scalable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "One Container per Pod" in Production?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Isolation of Responsibility&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each container does one job — making:&lt;/li&gt;
&lt;li&gt;Debugging easier&lt;/li&gt;
&lt;li&gt;Logging cleaner&lt;/li&gt;
&lt;li&gt;Ownership clear (dev vs. ops)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Horizontally scale pods with a single container based on CPU/RAM/load&lt;/li&gt;
&lt;li&gt;Apply pod autoscaling without worrying about co-packaged containers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Maintainability&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Easier CI/CD pipelines&lt;/li&gt;
&lt;li&gt;Smaller image sizes&lt;/li&gt;
&lt;li&gt;Easier to test, upgrade, or rollback individually&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Fault Tolerance&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If one pod/container crashes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only that service is affected, not an entire coupled group&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What If We Don’t Follow It?
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|   Problem                    |   Consequence                                                |
| ----------------------------- | ------------------------------------------------------------ |
| Multiple unrelated containers | Tight coupling, hard to debug &amp;amp; test                         |
| Mixed logging/monitoring      | Noisy logs, ambiguous metrics                                |
| Co-dependency                 | Can’t scale or update services independently                 |
| Increased blast radius        | One container failure could affect the whole pod’s operation |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Where and How to Use This Principle
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;When to Use One Container per Pod&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Situation                        | Use One Container? | Why                                             |
| -------------------------------- | ------------------ | ----------------------------------------------- |
| Independent microservices        |   Yes              | Clean design, easy to manage                    |
| Stateless backend or API         |   Yes              | Scalability, fault isolation                    |
| Event-driven consumers           |   Yes              | Simple, lean, retry logic handled by controller |
| Data processors (e.g., ETL jobs) |   Yes              | Lifecycle-bound, logs isolated                  |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to Consider Multiple Containers (with Caution)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Use Case                | Type         | Description                                     |
| ----------------------- | ------------ | ----------------------------------------------- |
| Sidecar (logging/proxy) | Shared Scope | Shares volume or networking with main container |
| Init container          | Pre-start    | Runs setup script before app starts             |
| Ambassador pattern      | Gateway      | Acts as proxy, often combined with service mesh |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Benefits of Using One Container per Pod&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Benefit            | Description                                 |
| ------------------ | ------------------------------------------- |
| Simpler Deployment | Easy to define and deploy YAML              |
| Easier Monitoring  | Logs and metrics tied to one process        |
| Better DevOps Flow | Aligned with microservice CI/CD pipelines   |
| Container Reuse    | One container image = multiple environments |
| Rolling Updates    | Zero-downtime with **Deployment strategy**  |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How to Apply It in Practice
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes YAML Example (Single Container Pod):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
kind: Pod
metadata:
  name: my-api
spec:
  containers:
  - name: api-container
    image: my-api-image:v1
    ports:
    - containerPort: 8080

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best Applied In:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Production microservices&lt;/li&gt;
&lt;li&gt;CI/CD pipelines&lt;/li&gt;
&lt;li&gt;Infrastructure-as-Code (IaC) like Helm, Kustomize, Terraform&lt;/li&gt;
&lt;li&gt;Monitoring dashboards (Grafana/Prometheus), because metrics/logs are clean&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Recommendation&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Aspect                   | Recommendation                                                                  | Reason                   |
| ------------------------ | ------------------------------------------------------------------------------- | ------------------------ |
| **Production Use**       |   Strongly Yes                                                                  | Clean, scalable, secure  |
| **Learning/Dev**         |   Yes                                                                           | Easier to debug and test |
| **Multiple Containers?** | Use only with Sidecar or Init Containers when **tight coupling is intentional** |                          |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Breaking the Rule with Purpose: When to Use Sidecars, Init Containers, and OPA in Kubernetes
&lt;/h2&gt;

&lt;p&gt;In Kubernetes, the principle of "One Container per Pod" is often recommended to maintain simplicity, separation of concerns, and ease of scaling. This approach ensures each pod does one thing well, following the Unix philosophy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ut5ap2trvvh40tsb2i7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ut5ap2trvvh40tsb2i7.png" alt="kubernetes" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But let’s face it — production environments are complex.&lt;/p&gt;

&lt;p&gt;There are real-world scenarios where this rule, while solid, becomes a bottleneck. For example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;What if your application needs a logging agent running alongside it?&lt;/li&gt;
&lt;li&gt;What if you need to perform a setup task before the main container starts?&lt;/li&gt;
&lt;li&gt;What if you need to enforce security policies on what gets deployed?&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is where Kubernetes-native Pod Patterns come into play. These aren’t workarounds — they’re intentional design features, battle-tested across thousands of production clusters.&lt;/p&gt;

&lt;p&gt;Let’s dive into these patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kubernetes Pod Design Patterns: Definitions and Purposes
&lt;/h2&gt;

&lt;p&gt;Before comparing use cases, here’s a quick overview of each pattern.&lt;br&gt;
&lt;strong&gt;Sidecar Container&lt;/strong&gt;&lt;br&gt;
A sidecar is a secondary container in the same pod as your main app. It usually provides auxiliary features like logging, monitoring, service mesh proxies (e.g., Envoy in Istio), or data sync tools.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Example: Fluent Bit running as a sidecar to ship logs to a centralized logging system like ELK or CloudWatch.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Init Container&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An Init Container runs before your main application container starts. It’s used for tasks that must complete successfully before the app begins, such as waiting for a database to become available or initializing a volume.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Example: A script that pulls secrets from AWS Secrets Manager before the main app starts.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Multi-Container Pod&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sometimes, you need multiple containers to share resources (like volumes or network namespaces) and work tightly together as a single unit. Kubernetes allows this via multi-container pods. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Example: An app + proxy architecture, where a caching proxy like NGINX shares a volume with the app to serve cached assets.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;OPA (Open Policy Agent)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OPA is not a pod pattern, but a Kubernetes-integrated policy engine. It runs as an admission controller and evaluates policies before allowing workloads to be deployed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Example: Prevent deploying pods that run as root or don’t have resource limits defined.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Use Case Comparison: When to Use What?
&lt;/h2&gt;

&lt;p&gt;Let’s break it down by use case and see where single-container pods hold up — and where patterns like sidecars and init containers are essential.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|     Use Case                            |   Single Container |   Multi-Container (Sidecar/Init) |   Why / Benefit                                              |
| --------------------------------------- | ------------------- | --------------------------------- | ------------------------------------------------------------- |
| Simple microservice                     |   Recommended       |   Not Needed                     | Keeps it simple, isolated, and independently scalable         |
| App + logging agent                     |   Lacking          |   Use Sidecar                     | Sidecar can ship logs (e.g., Fluent Bit, Promtail) separately |
| Pre-start setup (e.g., init database)   |   Not Possible     |   Use Init Container              | Guarantees setup tasks are done before app runs               |
| App with service mesh                   |   Not Enough       |   Sidecar (e.g., Envoy)           | Enables traffic control, mTLS, tracing via proxies            |
| Deploy policy enforcement               |   With OPA Hook     |   OPA Enforced                   | Prevents insecure or non-compliant pod specs                  |
| App needs config from secrets manager   |   Missing Logic    |   Init or Sidecar                 | Pull secrets securely before runtime                          |
| Application needs tightly coupled logic |   Better Separate  |   Can Use                        | Only if logic cannot be decoupled (e.g., proxy + app pair)    |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Trade-Offs You Should Be Aware Of
&lt;/h2&gt;

&lt;p&gt;While these patterns are powerful, they're not always the right answer. Some trade-offs include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Pattern             | Pros                                                | Cons                                                                   |
| ------------------- | --------------------------------------------------- | ---------------------------------------------------------------------- |
| **Sidecar**         | Modular, reusable, supports observability           | Resource sharing, lifecycle management complexity                      |
| **Init Container**  | Clean init logic, enforces sequence                 | Adds to pod startup latency                                            |
| **Multi-Container** | Co-location simplifies some tightly coupled tasks   | Harder to scale independently, debugging more complex                  |
| **OPA**             | Declarative policy control, security at deploy time | Learning curve, requires policy writing and admission controller setup |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Should You Use These Patterns?
&lt;/h2&gt;

&lt;p&gt;Yes — when the use case justifies it.&lt;/p&gt;

&lt;p&gt;These patterns exist to make Kubernetes production-grade. But like any engineering decision, use them with intent, not out of trend.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Tip: Start with “One Container per Pod.” Break that rule only when there's a clear, repeatable reason — like logging, proxying, initialization, or security enforcement.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Real-World Scenario: Banking App Traffic Spike
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F42by65c5a2b0kgbkx26n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F42by65c5a2b0kgbkx26n.png" alt="Kubernetes monolithic pod overload" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Problem:&lt;br&gt;
A banking application has high traffic for balance inquiries. Customers now also want OTPs and transaction alerts quickly. The app starts failing under load because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Logging service can’t keep up&lt;/li&gt;
&lt;li&gt;OTP system has race conditions&lt;/li&gt;
&lt;li&gt;Application startup is unreliable during node restarts&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Before — Problematic YAML (One Container per Pod)
&lt;/h2&gt;

&lt;p&gt;balance-app.yaml:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
kind: Pod
metadata:
  name: balance-app
spec:
  containers:
    - name: balance-app
      image: mybank/balance-check:v1
      ports:
        - containerPort: 8080

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Before-Problematic Architecture (One Container, Poor Setup)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌────────────────────────┐
│   User (Mobile/Web)    │
└────────┬───────────────┘
         │
         ▼
 ┌────────────────────────────┐
 │   balance-app Pod (v1)     │  ← Monolithic (One container handles everything)
 └────────────────────────────┘
         │
         ├── OTP call → Fails if OTP pod isn't ready
         └── Internal logs → Lost on crash

     🔻 Problems:
     - No readiness probe
     - No log persistence
     - OTP/MS failures break flow
     - Crash = no trace/debug

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What’s going wrong?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Component           | Issue                                                                                    |
| ------------------- | ---------------------------------------------------------------------------------------- |
|   Logging          | Logs are stored inside the container. Once the container crashes, logs are lost.         |
|   OTP Microservice | It runs in a separate pod. If OTP service isn't ready, the balance-app fails to connect. |
|   No Retry or Wait | There’s no mechanism to wait for OTP or DB to be ready before app starts                 |
|   Hard to Debug     | You can’t access logs post-mortem or know if failures happened at startup                |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fisya6dn88c731q2ekw20.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fisya6dn88c731q2ekw20.png" alt="Resilient kubernete pod design" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  After — Solved YAML (Still One Container per Pod, But Better)
&lt;/h2&gt;

&lt;p&gt;*&lt;em&gt;You don’t use Kubernetes patterns yet, but you improve by:&lt;br&gt;
*&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mounting logs to a persistent volume&lt;/li&gt;
&lt;li&gt;Adding readiness probes to avoid traffic until app is ready&lt;/li&gt;
&lt;li&gt;Managing environment variables for external dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;balance-app-fixed.yaml:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
kind: Pod
metadata:
  name: balance-app
spec:
  containers:
    - name: balance-app
      image: mybank/balance-check:v2
      ports:
        - containerPort: 8080
      env:
        - name: OTP_SERVICE_URL
          value: "http://otp-service:9090"
        - name: DB_HOST
          value: "bank-db"
      readinessProbe:
        httpGet:
          path: /health
          port: 8080
        initialDelaySeconds: 10
        periodSeconds: 5
      volumeMounts:
        - name: logs
          mountPath: /app/logs
  volumes:
    - name: logs
      emptyDir: {}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After — Improved One-Container Setup (Same Pattern, Better Practice)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌────────────────────────┐
│   User (Mobile/Web)    │
└────────┬───────────────┘
         │ HTTPS/API Call
         ▼
 ┌──────────────────────────────┐
 │ balance-app Pod (v2)         │  ← Still 1 container, but:
 │ - readinessProbe             │
 │ - env vars for OTP &amp;amp; DB      │
 │ - volumeMounts for logs      │
 └────────┬───────────────┬─────┘
          │               │
          │               ▼
          │        ┌──────────────┐
          │        │ OTP Service  │  ← Separate Pod
          │        └──────────────┘
          ▼
 ┌──────────────────────────────┐
 │ Logs Persisted (emptyDir)    │  ← Logs retained on crash
 └──────────────────────────────┘

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Explanation of the Fixes&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Fix                          | Purpose                                                           |
| ---------------------------- | ----------------------------------------------------------------- |
| `readinessProbe`             | Prevents traffic until the app is actually ready                  |
| `env` variables for OTP &amp;amp; DB | External service URLs are injected in a clean, reusable way       |
| `volumeMounts` + `emptyDir`  | Stores logs outside container file system (won’t vanish on crash) |
| Still One Container          | Yes. No sidecar or init yet – just improved hygiene               |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Version            | Containers | Resiliency | Logging               | Service Coordination      | Deployment Quality |
| ------------------ | ---------- | ---------- | --------------------- | ------------------------- | ------------------ |
|   Before           | 1          | Poor       | Volatile              | Manual, error-prone       | Naïve              |
|   After (Improved) | 1          | Medium     | Volatile but retained | Structured via ENV/Probes | Good baseline      |

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faqakybd6iq9pw26xv3so.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faqakybd6iq9pw26xv3so.png" alt="multi containers in pod" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Scenario: Banking App Traffic Spike
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A banking application runs as a &lt;strong&gt;single-container Pod&lt;/strong&gt; serving all responsibilities: balance check, OTP, logs, alerts.&lt;/li&gt;
&lt;li&gt;During traffic spikes (e.g., salary day, multiple users checking balance + receiving OTP), performance drops.&lt;/li&gt;
&lt;li&gt;OTPs are delayed, logs are dropped, CPU spikes — causing user frustration.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  BEFORE: Problematic YAML (Monolithic Pod Pattern)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;banking-app&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;banking-app&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ismailcorp/banking-app:latest&lt;/span&gt;
      &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
      &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;OTP_ENABLED&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LOGGING_ENABLED&lt;/span&gt;
          &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;500m"&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;256Mi"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What’s Wrong Here?
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Monolithic Design&lt;/td&gt;
&lt;td&gt;OTP, logging, app logic all bundled together&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No Scalability&lt;/td&gt;
&lt;td&gt;Can't scale OTP or logging independently&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU Bottlenecks&lt;/td&gt;
&lt;td&gt;OTP spikes during traffic cause app logic to slow down&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hard to Audit&lt;/td&gt;
&lt;td&gt;Logs generated internally, no audit separation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No Policy Controls&lt;/td&gt;
&lt;td&gt;No security policy (e.g., secrets as env vars, no resource governance)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  AFTER: Kubernetes Pattern Approach (Sidecar + Separate Deployments + OPA)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;Deployment&lt;/code&gt;: banking-app (business logic)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;banking-app&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;banking-app&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;banking-app&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;banking-service&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ismailcorp/banking-app:latest&lt;/span&gt;
          &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
          &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;300m"&lt;/span&gt;
              &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;256Mi"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;code&gt;Deployment&lt;/code&gt;: otp-service (Ambassador/Adapter Pattern)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;apps/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deployment&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otp-service&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;matchLabels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otp-service&lt;/span&gt;
  &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;labels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otp-service&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otp-generator&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ismailcorp/otp-service:latest&lt;/span&gt;
          &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9090&lt;/span&gt;
          &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;150m"&lt;/span&gt;
              &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;128Mi"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;code&gt;Sidecar&lt;/code&gt;: log-agent (Sidecar Pattern for Audit Logging)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Pod&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;banking-app-with-logger&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;banking-service&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ismailcorp/banking-app:latest&lt;/span&gt;
      &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;log-sidecar&lt;/span&gt;
      &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;fluent/fluentd:latest&lt;/span&gt;
      &lt;span class="na"&gt;volumeMounts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;logs&lt;/span&gt;
          &lt;span class="na"&gt;mountPath&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/var/log/app&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;logs&lt;/span&gt;
      &lt;span class="na"&gt;emptyDir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;code&gt;OPA Policy&lt;/code&gt;: Enforce Sidecar and No Plaintext Secrets
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rego"&gt;&lt;code&gt;&lt;span class="ow"&gt;package&lt;/span&gt; &lt;span class="n"&gt;kubernetes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;admission&lt;/span&gt;

&lt;span class="n"&gt;deny&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"Pod"&lt;/span&gt;
  &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;containers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"log-sidecar"&lt;/span&gt;
  &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="s2"&gt;"Missing audit logging sidecar"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;deny&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"Pod"&lt;/span&gt;
  &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;containers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"SECRET_KEY"&lt;/span&gt;
  &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="s2"&gt;"Secrets should be mounted as volumes, not set as ENV"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is enforced using &lt;strong&gt;OPA Gatekeeper&lt;/strong&gt; integrated into the Kubernetes API server.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;Service&lt;/code&gt;: For App &amp;amp; OTP
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;banking-service&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;banking-app&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;80&lt;/span&gt;
      &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Service&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otp-service&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;selector&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;otp-service&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TCP&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;81&lt;/span&gt;
      &lt;span class="na"&gt;targetPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9090&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What Did We Fix?
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Fix Area&lt;/th&gt;
&lt;th&gt;Before (Monolithic)&lt;/th&gt;
&lt;th&gt;After (K8s Patterns + OPA)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Architecture&lt;/td&gt;
&lt;td&gt;Single Container Pod&lt;/td&gt;
&lt;td&gt;Modular Pods with Sidecars and Deployments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logging&lt;/td&gt;
&lt;td&gt;Built-in, hard to audit&lt;/td&gt;
&lt;td&gt;Fluentd Sidecar, independent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OTP Handling&lt;/td&gt;
&lt;td&gt;Built-in&lt;/td&gt;
&lt;td&gt;Separate OTP service (scalable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scalability&lt;/td&gt;
&lt;td&gt;Entire pod only&lt;/td&gt;
&lt;td&gt;OTP and app scale independently&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security Policies&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Enforced via OPA (e.g., no plain secrets)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance (e.g. PCI)&lt;/td&gt;
&lt;td&gt;Weak&lt;/td&gt;
&lt;td&gt;Strong audit and policy governance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Conclusion: Which Kubernetes Pod Approach Should You Use?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Start with One Container per Pod
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Cleanest, simplest microservices structure&lt;/li&gt;
&lt;li&gt;Easy to debug, scale, and monitor&lt;/li&gt;
&lt;li&gt;Ideal for small services, MVPs, or early-stage architectures&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Evolve to Kubernetes Pod Patterns as Needed
&lt;/h3&gt;

&lt;p&gt;As complexity grows (especially in fintech, healthcare, or e-commerce), &lt;strong&gt;adopt design patterns&lt;/strong&gt; to solve real-world needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sidecars&lt;/strong&gt; → Add observability (e.g., logging, tracing), service mesh, proxies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Init Containers&lt;/strong&gt; → Ensure startup order, handle configuration or bootstrapping&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OPA/Gatekeeper Policies&lt;/strong&gt; → Enforce security, compliance, and governance&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  In High-Stakes Environments (Fintech, Regulated E-Commerce): Use a Hybrid Approach
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; Use &lt;strong&gt;single-container pods&lt;/strong&gt; for most microservices&lt;/li&gt;
&lt;li&gt; Add &lt;strong&gt;sidecars/init containers&lt;/strong&gt; when functionality justifies it&lt;/li&gt;
&lt;li&gt; Enforce policy-driven controls with &lt;strong&gt;OPA&lt;/strong&gt; or &lt;strong&gt;Kyverno&lt;/strong&gt; to meet compliance (e.g., PCI-DSS, RBI, HIPAA)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Trade-off Summary
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Trade-Offs / Complexity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;One Container per Pod&lt;/td&gt;
&lt;td&gt;Simplicity, scalability&lt;/td&gt;
&lt;td&gt;Lacks orchestration logic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pod Patterns (Sidecar, Init)&lt;/td&gt;
&lt;td&gt;Logging, proxying, workflows&lt;/td&gt;
&lt;td&gt;More YAML, more coordination&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OPA/Gatekeeper Policies&lt;/td&gt;
&lt;td&gt;Compliance, audit, guardrails&lt;/td&gt;
&lt;td&gt;Requires policy authoring skills&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Final Advice:&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Build simple&lt;/strong&gt;, &lt;strong&gt;evolve with patterns&lt;/strong&gt;, and &lt;strong&gt;secure with policy&lt;/strong&gt; as your system matures.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>aws</category>
      <category>discuss</category>
    </item>
  </channel>
</rss>
