<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Michael Uanikehi</title>
    <description>The latest articles on DEV Community by Michael Uanikehi (@oyiz-michael).</description>
    <link>https://dev.to/oyiz-michael</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F857583%2Fe786792b-b83e-4a28-b3c0-94f66d3d8469.png</url>
      <title>DEV Community: Michael Uanikehi</title>
      <link>https://dev.to/oyiz-michael</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/oyiz-michael"/>
    <language>en</language>
    <item>
      <title>Handling File Uploads in AWS Lambda with Powertools OpenAPI (From Limitation to Production Feature)</title>
      <dc:creator>Michael Uanikehi</dc:creator>
      <pubDate>Mon, 06 Apr 2026 18:13:46 +0000</pubDate>
      <link>https://dev.to/aws-builders/handling-file-uploads-in-aws-lambda-with-powertools-openapi-from-limitation-to-production-feature-4j19</link>
      <guid>https://dev.to/aws-builders/handling-file-uploads-in-aws-lambda-with-powertools-openapi-from-limitation-to-production-feature-4j19</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;Handling file uploads in serverless APIs sounds simple until you actually try to do it.&lt;/p&gt;

&lt;p&gt;If you're building APIs with AWS Lambda Powertools and OpenAPI validation, you quickly run into a limitation:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;multipart/form-data&lt;/code&gt; isn’t natively supported in the same way as JSON or form-encoded requests.&lt;/p&gt;

&lt;p&gt;That gap forces teams into workarounds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manual multipart parsing&lt;/li&gt;
&lt;li&gt;Base64 hacks&lt;/li&gt;
&lt;li&gt;Disabling validation entirely&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of which are ideal in production systems.&lt;/p&gt;

&lt;p&gt;This article walks through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The real problem&lt;/li&gt;
&lt;li&gt;How the feature was designed&lt;/li&gt;
&lt;li&gt;How you can now use it in practice&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Problem: File Uploads Break the Abstraction
&lt;/h2&gt;

&lt;p&gt;Before this feature, Powertools handled:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JSON payloads &lt;/li&gt;
&lt;li&gt;Query parameters &lt;/li&gt;
&lt;li&gt;Headers &lt;/li&gt;
&lt;li&gt;Form data (&lt;code&gt;application/x-www-form-urlencoded&lt;/code&gt;) &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But not:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;multipart/form-data&lt;/code&gt; (file uploads)&lt;/p&gt;

&lt;p&gt;That meant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/upload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;simply didn’t work with OpenAPI validation.&lt;/p&gt;

&lt;p&gt;Instead, developers had to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parse raw request bodies manually&lt;/li&gt;
&lt;li&gt;Disable validation middleware&lt;/li&gt;
&lt;li&gt;Or redesign APIs around non-standard formats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At scale, this creates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inconsistent APIs&lt;/li&gt;
&lt;li&gt;Security gaps&lt;/li&gt;
&lt;li&gt;Poor developer experience&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Goal: Make File Uploads First-Class
&lt;/h2&gt;

&lt;p&gt;The aim was simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Make file uploads work the same way as &lt;code&gt;Query()&lt;/code&gt;, &lt;code&gt;Header()&lt;/code&gt;, and &lt;code&gt;Form()&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Type-safe&lt;/li&gt;
&lt;li&gt;Automatically validated&lt;/li&gt;
&lt;li&gt;Fully reflected in OpenAPI schema&lt;/li&gt;
&lt;li&gt;Works with Swagger UI&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Solution: &lt;code&gt;File()&lt;/code&gt; Parameter Support
&lt;/h2&gt;

&lt;p&gt;You can now define file inputs like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_lambda_powertools.event_handler&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;APIGatewayRestResolver&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_lambda_powertools.event_handler.openapi.params&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;File&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;UploadFile&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;APIGatewayRestResolver&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enable_validation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enable_swagger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/swagger&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/upload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;UploadFile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;File&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File to upload&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file_size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Two Ways to Work with Files
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Raw bytes
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;File&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;File content only&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  2. Rich file object
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;UploadFile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;File&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Content&lt;/li&gt;
&lt;li&gt;Filename&lt;/li&gt;
&lt;li&gt;Content type&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is usually what you want in real systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Combining Files with Form Data
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/upload-csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upload_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;UploadFile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;File&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CSV file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
    &lt;span class="n"&gt;separator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Form&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CSV separator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;file_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This unlocks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Metadata + file uploads&lt;/li&gt;
&lt;li&gt;Real-world API patterns&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Changed Under the Hood
&lt;/h2&gt;

&lt;p&gt;Supporting this wasn’t just adding a new parameter type.&lt;/p&gt;

&lt;p&gt;It required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multipart parsing logic&lt;/li&gt;
&lt;li&gt;Boundary handling (including WebKit quirks)&lt;/li&gt;
&lt;li&gt;Base64 decoding for Lambda event payloads&lt;/li&gt;
&lt;li&gt;Differentiating file vs form fields&lt;/li&gt;
&lt;li&gt;OpenAPI schema generation (&lt;code&gt;format: binary&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Validation integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There’s also a helpful runtime safeguard:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If multipart requests aren’t properly base64 encoded, a warning is emitted&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This helps catch common misconfigurations early.&lt;/p&gt;




&lt;h2&gt;
  
  
  API Gateway Gotcha (Important)
&lt;/h2&gt;

&lt;p&gt;If you're using &lt;strong&gt;REST API (v1)&lt;/strong&gt;, you must configure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;Globals&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;Api&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;BinaryMediaTypes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;multipart~1form-data"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this:&lt;br&gt;
File uploads won’t work correctly.&lt;/p&gt;

&lt;p&gt;For:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTTP API (v2)&lt;/li&gt;
&lt;li&gt;Lambda Function URLs&lt;/li&gt;
&lt;li&gt;ALB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It works out of the box.&lt;/p&gt;




&lt;h2&gt;
  
  
  Before vs After
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Before
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Manual parsing&lt;/li&gt;
&lt;li&gt;No validation&lt;/li&gt;
&lt;li&gt;Custom schemas&lt;/li&gt;
&lt;li&gt;Inconsistent APIs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  After
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Native &lt;code&gt;File()&lt;/code&gt; support&lt;/li&gt;
&lt;li&gt;OpenAPI validation&lt;/li&gt;
&lt;li&gt;Swagger UI integration&lt;/li&gt;
&lt;li&gt;Cleaner, safer APIs&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;This isn’t just about file uploads.&lt;/p&gt;

&lt;p&gt;It’s about &lt;strong&gt;removing friction from real-world API design&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When basic capabilities are missing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Engineers build workarounds&lt;/li&gt;
&lt;li&gt;Systems become inconsistent&lt;/li&gt;
&lt;li&gt;Reliability suffers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By making file uploads a first-class feature:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;APIs become more predictable&lt;/li&gt;
&lt;li&gt;Validation becomes reliable&lt;/li&gt;
&lt;li&gt;Developer experience improves significantly&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Open Source Insight
&lt;/h2&gt;

&lt;p&gt;One interesting part of this work:&lt;/p&gt;

&lt;p&gt;The implementation evolved through multiple iterations before reaching the final version.&lt;/p&gt;

&lt;p&gt;Large features often:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start broad&lt;/li&gt;
&lt;li&gt;Get refined for maintainability&lt;/li&gt;
&lt;li&gt;Land as cleaner, more focused implementations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That process is what makes open source powerful —&lt;br&gt;
it’s not just about shipping code, but improving it collaboratively.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;If you’re building serverless APIs and dealing with file uploads:&lt;/p&gt;

&lt;p&gt;You no longer need workarounds.&lt;/p&gt;

&lt;p&gt;This feature brings Powertools closer to frameworks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FastAPI&lt;/li&gt;
&lt;li&gt;Django&lt;/li&gt;
&lt;li&gt;Express&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…but in a serverless-native way.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Feature PR: &lt;a href="https://github.com/aws-powertools/powertools-lambda-python/pull/8093" rel="noopener noreferrer"&gt;https://github.com/aws-powertools/powertools-lambda-python/pull/8093&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Original implementation: &lt;a href="https://github.com/aws-powertools/powertools-lambda-python/pull/7132" rel="noopener noreferrer"&gt;https://github.com/aws-powertools/powertools-lambda-python/pull/7132&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Feature request: &lt;a href="https://github.com/aws-powertools/powertools-lambda-python/issues/7124" rel="noopener noreferrer"&gt;https://github.com/aws-powertools/powertools-lambda-python/issues/7124&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>observability</category>
      <category>serverless</category>
      <category>aws</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How I Fixed an SSI-Breaking Bug in NGINX Gateway Fabric</title>
      <dc:creator>Michael Uanikehi</dc:creator>
      <pubDate>Mon, 23 Mar 2026 21:47:24 +0000</pubDate>
      <link>https://dev.to/oyiz-michael/how-i-fixed-an-ssi-breaking-bug-in-nginx-gateway-fabric-307g</link>
      <guid>https://dev.to/oyiz-michael/how-i-fixed-an-ssi-breaking-bug-in-nginx-gateway-fabric-307g</guid>
      <description>&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This bug found me, not the other way around.&lt;/p&gt;

&lt;p&gt;I was in the middle of migrating my team's infrastructure from NGINX Ingress Controller to NGINX Gateway Fabric when SSI (Server-Side Includes) stopped working. Subrequests were hitting the wrong backend paths, and pages that relied on SSI includes were silently broken. I could have worked around it, added a flag, patched a config, moved on. Instead, I decided to find where it was actually coming from and fix it at the source.&lt;/p&gt;

&lt;p&gt;This is the story of that bug, what caused it, and the one-line condition that fixed it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Background: What is NGINX Gateway Fabric?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;NGINX Gateway Fabric is a Kubernetes Gateway API implementation backed by NGINX. It translates Kubernetes &lt;code&gt;HTTPRoute&lt;/code&gt; resources into NGINX configuration, handling routing, load balancing, and traffic management. It's the next-generation replacement for NGINX Ingress Controller, built around the Kubernetes Gateway API spec.&lt;/p&gt;

&lt;p&gt;When processing HTTP routes, it generates &lt;code&gt;proxy_pass&lt;/code&gt; directives in NGINX location blocks to forward traffic to upstream backends.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Migration That Surfaced the Bug&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;During the migration from NGINX Ingress Controller to NGINX Gateway Fabric, one of our services used SSI to compose pages from multiple backend responses a pattern like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;h1&amp;gt;&lt;/span&gt;SSI Test Page&lt;span class="nt"&gt;&amp;lt;/h1&amp;gt;&lt;/span&gt;
&lt;span class="c"&gt;&amp;lt;!--# include virtual="/include.html" --&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On NGINX Ingress Controller this worked fine. After switching to NGINX Gateway Fabric, the SSI includes were broken. After some investigation, the generated &lt;code&gt;proxy_pass&lt;/code&gt; directive was the culprit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;proxy_pass&lt;/span&gt; &lt;span class="s"&gt;http://my-backend&lt;/span&gt;&lt;span class="nv"&gt;$request_uri&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;$request_uri&lt;/code&gt; in NGINX always holds the &lt;strong&gt;original&lt;/strong&gt; client request URI, it never changes, even during internal subrequests. So when SSI triggered a subrequest to &lt;code&gt;/include.html&lt;/code&gt;, the &lt;code&gt;proxy_pass&lt;/code&gt; directive would still forward &lt;code&gt;$request_uri&lt;/code&gt; (e.g. &lt;code&gt;/&lt;/code&gt;) to the backend completely ignoring the subrequest's intended path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understanding the Two Location Types&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;NGINX Gateway Fabric uses two types of location blocks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;External locations&lt;/strong&gt; — match the original client request directly. Here, &lt;code&gt;$uri&lt;/code&gt; is already the correctly processed URI. Using &lt;code&gt;$request_uri&lt;/code&gt; is redundant and harmful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Internal locations&lt;/strong&gt; — used for NJS-driven HTTP matching. When a request comes in, NJS evaluates the matching rules and calls &lt;code&gt;r.internalRedirect(match.redirectPath + args)&lt;/code&gt;, which changes &lt;code&gt;$uri&lt;/code&gt; to an internal path like &lt;code&gt;/@rule0-route0&lt;/code&gt;. Without &lt;code&gt;$request_uri&lt;/code&gt;, NGINX would forward that internal path to the backend — so &lt;code&gt;$request_uri&lt;/code&gt; is needed here to restore the original client URI.&lt;/p&gt;

&lt;p&gt;The bug was that the code made no distinction between these two cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;internal/controller/nginx/config/servers.go&lt;/code&gt;, the &lt;code&gt;createProxyPass&lt;/code&gt; function was changed from:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Before: applies to ALL non-gRPC locations&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;filter&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;requestURI&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"$request_uri"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// After: only applies to INTERNAL locations&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;grpc&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;locationType&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;InternalLocationType&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;filter&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Path&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;requestURI&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"$request_uri"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;External locations&lt;/strong&gt; generate: &lt;code&gt;proxy_pass http://my-backend;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internal locations&lt;/strong&gt; still generate: &lt;code&gt;proxy_pass http://my-backend$request_uri;&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Verifying the Fix&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I deployed the fix to a local &lt;code&gt;kind&lt;/code&gt; cluster with an SSI-enabled backend. Before the fix, the SSI include failed the subrequest hit the wrong path. After the fix, the response correctly returned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;h1&amp;gt;&lt;/span&gt;SSI Test Page&lt;span class="nt"&gt;&amp;lt;/h1&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;p&amp;gt;&lt;/span&gt;This content was included via SSI!&lt;span class="nt"&gt;&amp;lt;/p&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The generated NGINX config confirmed the clean &lt;code&gt;proxy_pass&lt;/code&gt; directive with no &lt;code&gt;$request_uri&lt;/code&gt; for the external location.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Takeaways&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-world migrations are great bug finders. Moving from NGINX Ingress Controller to NGINX Gateway Fabric exposed a behavioral difference that tests hadn't caught.&lt;/li&gt;
&lt;li&gt;NGINX's &lt;code&gt;$request_uri&lt;/code&gt; is immutable across the entire request lifecycle, including subrequests. it always reflects the &lt;em&gt;original&lt;/em&gt; client URI.&lt;/li&gt;
&lt;li&gt;Blindly appending &lt;code&gt;$request_uri&lt;/code&gt; to &lt;code&gt;proxy_pass&lt;/code&gt; can interfere with NGINX features that rely on internal subrequests (SSI, auth subrequests, etc.).&lt;/li&gt;
&lt;li&gt;When you hit a bug in open source, you can just fix it. The maintainers were responsive and collaborative throughout the review process.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;PR:&lt;/strong&gt; &lt;a href="https://github.com/nginx/nginx-gateway-fabric/pull/4935" rel="noopener noreferrer"&gt;https://github.com/nginx/nginx-gateway-fabric/pull/4935&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>networking</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Measuring What Matters: Rethinking Serverless Workflows with AWS Lambda Durable Functions</title>
      <dc:creator>Michael Uanikehi</dc:creator>
      <pubDate>Sat, 21 Mar 2026 19:12:41 +0000</pubDate>
      <link>https://dev.to/aws-builders/measuring-what-matters-rethinking-serverless-workflows-with-aws-lambda-durable-functions-406l</link>
      <guid>https://dev.to/aws-builders/measuring-what-matters-rethinking-serverless-workflows-with-aws-lambda-durable-functions-406l</guid>
      <description>&lt;p&gt;Most serverless workflows don’t fail because they can’t scale.&lt;/p&gt;

&lt;p&gt;They fail because when something goes wrong, engineers can’t easily answer:&lt;br&gt;
    • Where did this workflow break?&lt;br&gt;
    • What state was it in?&lt;br&gt;
    • What happened before the failure?&lt;/p&gt;

&lt;p&gt;This is where “measuring what matters” becomes important.&lt;/p&gt;

&lt;p&gt;Not more metrics.&lt;br&gt;
Not more dashboards.&lt;br&gt;
But better ways to understand system behaviour.&lt;/p&gt;

&lt;p&gt;Recently, I explored AWS Lambda Durable Functions, and it exposed something interesting:&lt;/p&gt;

&lt;p&gt;The way we structure workflows directly affects how well we can observe and debug them.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Problem: Orchestration vs Understanding
&lt;/h3&gt;

&lt;p&gt;If you’ve built workflows using AWS Step Functions, you already know the benefits:&lt;br&gt;
    • Clear state transitions&lt;br&gt;
    • Visual workflows&lt;br&gt;
    • Strong integration with AWS services&lt;/p&gt;

&lt;p&gt;But in practice, there’s a trade-off; Workflow logic lives outside your application code.&lt;/p&gt;

&lt;p&gt;That means:&lt;br&gt;
    • You switch between code and state machine definitions&lt;br&gt;
    • Debugging often requires jumping across tools&lt;br&gt;
    • Context is split across logs, states, and services&lt;/p&gt;

&lt;p&gt;This works well for orchestration.&lt;/p&gt;

&lt;p&gt;But it doesn’t always optimise for debugging and reasoning under pressure.&lt;/p&gt;
&lt;h4&gt;
  
  
  What Durable Functions Change
&lt;/h4&gt;

&lt;p&gt;AWS Lambda Durable Functions take a different approach.&lt;/p&gt;

&lt;p&gt;Instead of defining workflows externally, you write them directly in code.&lt;/p&gt;

&lt;p&gt;The biggest shift is state management. Durable Functions are regular Lambda functions enhanced with stateful execution capabilities!&lt;br&gt;
Durable functions automatically checkpoint progress, suspend execution for up to one year during long-running tasks, and recover from failures.&lt;/p&gt;

&lt;p&gt;Here’s a simplified example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;aws_lambda_powertools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Logger&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;order_workflow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Processing order &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 1: Validate order
&lt;/span&gt;    &lt;span class="nf"&gt;validate_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 2: Wait for payment confirmation
&lt;/span&gt;    &lt;span class="nf"&gt;wait_for_payment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 3: Process shipment
&lt;/span&gt;    &lt;span class="nf"&gt;ship_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now imagine this workflow:&lt;br&gt;
    • pauses after wait_for_payment()&lt;br&gt;
    • resumes hours later when payment is confirmed&lt;br&gt;
    • continues with full context preserved&lt;/p&gt;
&lt;h4&gt;
  
  
  Why This Matters for Observability
&lt;/h4&gt;

&lt;p&gt;This isn’t just about developer experience, It changes how you instrument and observe workflows.&lt;/p&gt;

&lt;p&gt;With traditional orchestration:&lt;br&gt;
    • Step Function execution graphs&lt;br&gt;
    • Distributed logs&lt;br&gt;
    • External state tracking&lt;/p&gt;

&lt;p&gt;With Durable Functions You can:&lt;br&gt;
    • Log at each logical step&lt;br&gt;
    • Track state transitions in code&lt;br&gt;
    • Correlate execution paths more naturally&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;logger.info(&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"step"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"payment_wait"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"order_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pending"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now your logs reflect business flow, not just system events.&lt;/p&gt;

&lt;h3&gt;
  
  
  Measuring What Actually Matters
&lt;/h3&gt;

&lt;p&gt;In real systems, useful signals are not:&lt;br&gt;
    • “Lambda ran successfully”&lt;br&gt;
    • “Step transitioned”&lt;/p&gt;

&lt;p&gt;Useful signals are:&lt;br&gt;
    • “Order is waiting on payment”&lt;br&gt;
    • “Workflow resumed after 2 hours”&lt;br&gt;
    • “Shipment failed after approval”&lt;/p&gt;

&lt;p&gt;Durable Functions make it easier to express these signals because: your workflow structure matches your mental model&lt;/p&gt;

&lt;p&gt;That alignment reduces the gap between:&lt;br&gt;
    • what the system is doing&lt;br&gt;
    • and what you think it’s doing&lt;/p&gt;

&lt;h3&gt;
  
  
  Durable Functions vs Step Functions (Practical View)
&lt;/h3&gt;

&lt;p&gt;Use Step Functions when you need:&lt;br&gt;
    • Service orchestration across AWS (Lambda, ECS, Glue)&lt;br&gt;
    • Visual workflows for operations teams&lt;br&gt;
    • Built-in execution tracing&lt;/p&gt;

&lt;p&gt;Use Durable Functions when you need:&lt;br&gt;
    • Workflow logic tightly coupled with application code&lt;br&gt;
    • Faster iteration and local testing&lt;br&gt;
    • Simpler debugging of business logic&lt;/p&gt;

&lt;h4&gt;
  
  
  Trade-offs (Important)
&lt;/h4&gt;

&lt;p&gt;Durable Functions are not a silver bullet.&lt;/p&gt;

&lt;p&gt;You lose:&lt;br&gt;
    • visual workflow diagrams&lt;br&gt;
    • some operational visibility for non-engineers&lt;/p&gt;

&lt;p&gt;And you gain:&lt;br&gt;
    • code-level control&lt;br&gt;
    • simpler reasoning&lt;br&gt;
    • tighter integration with your application&lt;/p&gt;

&lt;p&gt;Final Thoughts&lt;/p&gt;

&lt;p&gt;Reliable systems are not just systems that run; They’re systems engineers can:&lt;br&gt;
    • understand&lt;br&gt;
    • debug&lt;br&gt;
    • trust during incidents&lt;/p&gt;

&lt;p&gt;Durable Functions don’t magically solve observability But they remove a layer of abstraction that often gets in the way.&lt;/p&gt;

&lt;p&gt;And that makes it easier to measure what actually matters.&lt;/p&gt;

&lt;p&gt;If you’re already using Step Functions, you don’t need to replace them. But if your workflows feel harder to reason about than they should…&lt;/p&gt;

&lt;p&gt;It might be worth trying a different approach.&lt;/p&gt;

</description>
      <category>serverless</category>
      <category>lambda</category>
      <category>stepfunctions</category>
      <category>aws</category>
    </item>
    <item>
      <title>Troubleshooting EFS Mount Failures in EKS: The IAM Mount Option Mystery</title>
      <dc:creator>Michael Uanikehi</dc:creator>
      <pubDate>Wed, 14 Jan 2026 00:57:48 +0000</pubDate>
      <link>https://dev.to/oyiz-michael/troubleshooting-efs-mount-failures-in-eks-the-iam-mount-option-mystery-4e6h</link>
      <guid>https://dev.to/oyiz-michael/troubleshooting-efs-mount-failures-in-eks-the-iam-mount-option-mystery-4e6h</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If you're getting &lt;code&gt;mount.nfs4: access denied by server while mounting 127.0.0.1:/&lt;/code&gt; when mounting EFS volumes in EKS, and your security groups are correct, you're probably missing the &lt;code&gt;iam&lt;/code&gt; mount option in your PersistentVolume definition when using an EFS file system policy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;While integrating a new reporting service into our EKS cluster that needed to write reports to a shared EFS filesystem. The pod kept failing to mount with this cryptic error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MountVolume.SetUp failed for volume "efs-pv": rpc error: code = Internal desc = Could not mount "{efs_id}:/"
Output: mount.nfs4: access denied by server while mounting 127.0.0.1:/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Investigation Journey
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Initial Suspicions (All Wrong)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Theory 1: Security Group Issues&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verified NFS traffic (TCP 2049) allowed between worker nodes and EFS mount targets&lt;/li&gt;
&lt;li&gt;Mount targets existed in all Availability Zones&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Security groups were perfect. Not the issue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Theory 2: EFS File System Policy&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We had recently added an IAM-based file system policy to restrict access&lt;/li&gt;
&lt;li&gt;Policy included conditions like &lt;code&gt;aws:PrincipalArn&lt;/code&gt; to whitelist specific IAM roles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The breakthrough:&lt;/strong&gt; Removing the policy made it work!&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Eureka Moment
&lt;/h3&gt;

&lt;p&gt;Reading the &lt;a href="https://repost.aws/knowledge-center/eks-troubleshoot-efs-volume-mount-issues" rel="noopener noreferrer"&gt;AWS EFS troubleshooting documentation&lt;/a&gt;, I found this gem:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;If you don't add the iam mount option with a restrictive file system policy, then the pods fail with the following error message:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;mount.nfs4: access denied by server while mounting 127.0.0.1:/&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Root Cause Analysis
&lt;/h2&gt;

&lt;p&gt;The issue had &lt;strong&gt;three interconnected parts&lt;/strong&gt;:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. EFS File System Policy Conditions
&lt;/h3&gt;

&lt;p&gt;We used &lt;code&gt;aws:PrincipalArn&lt;/code&gt; in our policy conditions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"Condition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"ArnLike"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"aws:PrincipalArn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:iam::123456789012:role/worker-node-role"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:iam::123456789012:role/efs-csi-driver-role"&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Per AWS docs, &lt;code&gt;aws:PrincipalArn&lt;/code&gt; and most IAM condition keys &lt;strong&gt;are NOT enforced&lt;/strong&gt; for NFS client mounts to EFS. Only these conditions work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;aws:SecureTransport&lt;/code&gt; (Boolean)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;aws:SourceIp&lt;/code&gt; (String - public IPs only)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;elasticfilesystem:AccessPointArn&lt;/code&gt; (String)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;elasticfilesystem:AccessedViaMountTarget&lt;/code&gt; (Boolean)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Missing IAM Mount Option
&lt;/h3&gt;

&lt;p&gt;Our PersistentVolume was missing the &lt;code&gt;iam&lt;/code&gt; mount option:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# BEFORE - Missing iam mount option&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PersistentVolume&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;efs-pv&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;storageClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-efs-csi-sc&lt;/span&gt;
 &lt;span class="na"&gt;csi&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;efs.csi.aws.com&lt;/span&gt;
 &lt;span class="na"&gt;volumeHandle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{efs_id}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without &lt;code&gt;iam&lt;/code&gt;, the EFS CSI driver doesn't authenticate using IAM roles, so any file system policy with IAM restrictions fails.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The EFS Mount Flow
&lt;/h3&gt;

&lt;p&gt;When using the EFS CSI driver with &lt;code&gt;tls&lt;/code&gt; mount option:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Node-level mount&lt;/strong&gt; happens first (via worker node IAM role)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Without&lt;/strong&gt; &lt;code&gt;iam&lt;/code&gt; option → Anonymous NFS mount&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;With&lt;/strong&gt; &lt;code&gt;iam&lt;/code&gt; option → Authenticated mount using IAM role credentials&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Solution
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Fix 1: Added &lt;code&gt;mountOptions: [tls, iam]&lt;/code&gt; to PersistentVolume
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;#  AFTER - With iam mount option&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;PersistentVolume&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;efs-pv&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;storageClassName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-efs-csi-sc&lt;/span&gt;
 &lt;span class="na"&gt;mountOptions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;tls&lt;/span&gt; &lt;span class="c1"&gt;# Encryption in transit&lt;/span&gt;
 &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;iam&lt;/span&gt; &lt;span class="c1"&gt;# Enable IAM authentication&lt;/span&gt;
 &lt;span class="na"&gt;csi&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
 &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;efs.csi.aws.com&lt;/span&gt;
 &lt;span class="na"&gt;volumeHandle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{efs_id}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Fix 2: Use Only Supported EFS Condition Keys
&lt;/h3&gt;

&lt;p&gt;If you need a file system policy, use only the supported conditions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"Principal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"AWS"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="s2"&gt;"elasticfilesystem:ClientMount"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="s2"&gt;"elasticfilesystem:ClientWrite"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="s2"&gt;"elasticfilesystem:ClientRootAccess"&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:elasticfilesystem:us-east-1:123456789012:file-system/{efs_id}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"Condition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"Bool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"elasticfilesystem:AccessedViaMountTarget"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"true"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"aws:SecureTransport"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"true"&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This policy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Requires TLS encryption (&lt;code&gt;aws:SecureTransport&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Requires access via mount targets (prevents direct IP access)&lt;/li&gt;
&lt;li&gt;Uses only supported condition keys&lt;/li&gt;
&lt;li&gt;Relies on security groups for network-level access control&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Learnings
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;IAM Mount Option is Required for IAM Authorization&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Without &lt;code&gt;-o iam&lt;/code&gt;, EFS mounts are anonymous. Any IAM-based file system policy will deny access.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Not All IAM Conditions Work with EFS&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Only 4 condition keys are enforced for NFS mounts. Using others creates a false sense of security.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Layer Your Security Properly&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Network Layer:&lt;/strong&gt; Security groups (who can reach mount targets)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IAM Layer:&lt;/strong&gt; IAM policies on roles (what actions are allowed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File System Layer:&lt;/strong&gt; EFS policy (additional restrictions)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Read the Error Logs Carefully&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The error message mentioned &lt;code&gt;127.0.0.1&lt;/code&gt; because the EFS mount helper creates a local stunnel proxy for TLS. The actual connection fails at the IAM authorization layer, not network layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. &lt;strong&gt;Test Mount Operations Manually&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;SSH to a worker node and test the mount with the EFS mount helper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;mount &lt;span class="nt"&gt;-t&lt;/span&gt; efs &lt;span class="nt"&gt;-o&lt;/span&gt; tls,iam &lt;span class="o"&gt;{&lt;/span&gt;efs_id&lt;span class="o"&gt;}&lt;/span&gt;:/ /mnt/test
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This validates the configuration outside of Kubernetes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;What seemed like a complex IAM policy issue turned out to be a missing mount option. The key insight was understanding that EFS file system policies &lt;strong&gt;require explicit IAM authentication&lt;/strong&gt; via the &lt;code&gt;iam&lt;/code&gt; mount option, and that most IAM condition keys don't apply to NFS mounts.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>kubernetes</category>
      <category>security</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Measuring What Matters: Adding Multiple Dimension Sets to AWS Lambda Powertools</title>
      <dc:creator>Michael Uanikehi</dc:creator>
      <pubDate>Mon, 12 Jan 2026 23:36:20 +0000</pubDate>
      <link>https://dev.to/aws-builders/measuring-what-matters-adding-multiple-dimension-sets-to-aws-lambda-powertools-aob</link>
      <guid>https://dev.to/aws-builders/measuring-what-matters-adding-multiple-dimension-sets-to-aws-lambda-powertools-aob</guid>
      <description>&lt;p&gt;Most production systems don’t fail because they lack metrics.&lt;br&gt;
They fail because the metrics they do have flatten reality.&lt;/p&gt;

&lt;p&gt;Over time, I kept seeing the same pattern across teams and architectures: engineers had plenty of dashboards, yet struggled to answer simple questions during incidents.&lt;/p&gt;

&lt;p&gt;Not because the data wasn’t there but because it was aggregated in ways that hid meaningful differences.&lt;/p&gt;

&lt;p&gt;This is the problem that led to the addition of multiple dimension sets in AWS Lambda Powertools for Python.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Real Problem: Aggregation, Not Instrumentation
&lt;/h3&gt;

&lt;p&gt;CloudWatch’s Embedded Metric Format (EMF) has long supported dimensional metrics.&lt;br&gt;
In theory, this allows teams to slice metrics by environment, region, customer type, or deployment shape.&lt;/p&gt;

&lt;p&gt;In practice, most teams are forced to choose one aggregation view per metric emission.&lt;/p&gt;

&lt;p&gt;You can measure latency by:&lt;br&gt;
    • service + region, or&lt;br&gt;
    • service + environment, or&lt;br&gt;
    • service + customer_type&lt;/p&gt;

&lt;p&gt;But not all of them at once unless you emit the same metric repeatedly with different dimension combinations.&lt;/p&gt;

&lt;p&gt;That trade-off shows up quickly in real systems:&lt;br&gt;
    • Metrics get duplicated&lt;br&gt;
    • Code becomes verbose and fragile&lt;br&gt;
    • CloudWatch costs increase&lt;br&gt;
    • Important aggregation paths are missing when you need them most&lt;/p&gt;

&lt;p&gt;The result isn’t just inefficiency it’s lost confidence during incidents.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Feature Request That Captured the Pattern
&lt;/h3&gt;

&lt;p&gt;This limitation wasn’t theoretical.&lt;/p&gt;

&lt;p&gt;In early 2025, a community member opened a feature request in the AWS Lambda Powertools repository:&lt;/p&gt;

&lt;p&gt;“Add support for multiple dimension sets to the same Metrics instance”&lt;br&gt;
(Issue #6198)&lt;/p&gt;

&lt;p&gt;The use case was clear:&lt;br&gt;
    • A Lambda deployed across multiple regions and environments&lt;br&gt;
    • Metrics that needed to be aggregated by environment, region, and both&lt;br&gt;
    • One metric value, many meaningful views&lt;/p&gt;

&lt;p&gt;The request also highlighted an important fact:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The EMF specification already supports this.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Dimensions field in EMF is defined as an &lt;strong&gt;array of arrays&lt;/strong&gt; each inner array representing a different aggregation view.&lt;/p&gt;

&lt;p&gt;Other Powertools runtimes (TypeScript, Java, .NET) already exposed this capability.&lt;/p&gt;

&lt;p&gt;Python didn’t.&lt;/p&gt;
&lt;h4&gt;
  
  
  From Feature Request to Production-Ready Implementation
&lt;/h4&gt;

&lt;p&gt;After maintainers aligned on the approach, I picked up the work to implement this feature for the Python runtime.&lt;/p&gt;

&lt;p&gt;The goal wasn’t to invent something new.&lt;br&gt;
It was to:&lt;br&gt;
    • Align Python with the EMF specification&lt;br&gt;
    • Reach feature parity with other Powertools runtimes&lt;br&gt;
    • Deliver a clean, intuitive API that felt natural to existing users&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design principles&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before touching code, a few constraints guided the implementation:&lt;br&gt;
    • &lt;em&gt;Backward compatibility&lt;/em&gt; - existing add_dimension() behavior must remain unchanged&lt;br&gt;
    • &lt;em&gt;Clear mental model&lt;/em&gt; - no hidden side effects or ambiguous APIs&lt;br&gt;
    • &lt;em&gt;Spec-aligned output&lt;/em&gt; - serialized EMF must match CloudWatch expectation&lt;br&gt;
    • &lt;em&gt;Production safety&lt;/em&gt; - strict validation and cleanup between invocations&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Resulting API&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The final design mirrors the proven pattern from the TypeScript implementation:&lt;br&gt;
    • add_dimension() → adds to the primary dimension set&lt;br&gt;
    • add_dimensions() → creates a new aggregation view&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example usage&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from aws_lambda_powertools import Metrics
from aws_lambda_powertools.metrics import MetricUnit

metrics = Metrics(namespace="ServerlessAirline", service="booking")

metrics.add_dimensions({"environment": "prod", "region": "us-east-1"})
metrics.add_dimensions({"environment": "prod"})
metrics.add_dimensions({"region": "us-east-1"})

metrics.add_metric(
    name="SuccessfulRequests",
    unit=MetricUnit.Count,
    value=100
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With a single metric emission, CloudWatch can now aggregate across:&lt;br&gt;
    • environment + region&lt;br&gt;
    • environment only&lt;br&gt;
    • region only&lt;/p&gt;

&lt;p&gt;No duplicate metrics.&lt;br&gt;
No parallel pipelines.&lt;br&gt;
No guesswork.&lt;/p&gt;

&lt;h4&gt;
  
  
  What Changed Under the Hood
&lt;/h4&gt;

&lt;p&gt;The implementation introduced a few key changes:&lt;br&gt;
    • Tracked multiple dimension sets internally&lt;br&gt;
    • Updated EMF serialization to emit all dimension arrays&lt;br&gt;
    • Ensured default dimensions are automatically included&lt;br&gt;
    • Enforced CloudWatch’s 30-dimension limit&lt;br&gt;
    • Handled duplicate keys deterministically (“last value wins”)&lt;br&gt;
    • Cleared dimension state safely between invocations&lt;/p&gt;

&lt;p&gt;To ensure reliability, the change shipped with 13 new tests, covering:&lt;br&gt;
    • Multiple dimension set creation&lt;br&gt;
    • Validation and edge cases&lt;br&gt;
    • Integration with existing metrics features&lt;br&gt;
    • High-resolution metrics compatibility&lt;/p&gt;

&lt;p&gt;All existing tests passed, code quality checks succeeded, and maintainers approved the change for merge.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why This Matters in Production
&lt;/h4&gt;

&lt;p&gt;This feature doesn’t add more metrics.&lt;/p&gt;

&lt;p&gt;It makes existing metrics more truthful.&lt;/p&gt;

&lt;p&gt;When teams can express multiple aggregation views at the point of emission:&lt;br&gt;
    • Incident response becomes faster&lt;br&gt;
    • Dashboards become simpler&lt;br&gt;
    • Alerting becomes more precise&lt;br&gt;
    • Engineers trust what they see&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metrics are contracts.&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;If they can’t reflect how users actually experience the system, they quietly fail.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Multiple dimension sets don’t eliminate operational problems but they remove a blind spot that many teams didn’t realize they had.&lt;/p&gt;

&lt;p&gt;The full implementation, tests, and maintainer review can be found in the merged pull request:&lt;br&gt;
&lt;a href="https://github.com/aws-powertools/powertools-lambda-python/pull/7848" rel="noopener noreferrer"&gt;https://github.com/aws-powertools/powertools-lambda-python/pull/7848&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Open Source as Shared Problem-Solving
&lt;/h3&gt;

&lt;p&gt;What made this contribution meaningful wasn’t just the code.&lt;/p&gt;

&lt;p&gt;It was the process:&lt;br&gt;
    • A well-documented community feature request&lt;br&gt;
    • Maintainer collaboration across runtimes&lt;br&gt;
    • Alignment with existing specifications&lt;br&gt;
    • A solution designed for long-term maintainability&lt;/p&gt;

&lt;p&gt;This is open source at its best: turning recurring operational pain into shared infrastructure improvements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Measuring What Actually Matters
&lt;/h3&gt;

&lt;p&gt;Reliability isn’t about collecting more data.&lt;br&gt;
It’s about choosing the signals that deserve to exist.&lt;/p&gt;

&lt;p&gt;This change helps teams measure systems the way users experience them — not just the way dashboards prefer.&lt;/p&gt;

&lt;p&gt;And that difference matters.&lt;/p&gt;

&lt;p&gt;If you’re duplicating EMF emissions just to get different aggregation views, this should make your metrics simpler, clearer, and more reliable.&lt;/p&gt;

&lt;p&gt;And if you run into edge cases, open an issue.&lt;/p&gt;

&lt;p&gt;That’s how this ecosystem keeps improving.&lt;/p&gt;

</description>
      <category>observability</category>
      <category>aws</category>
      <category>serverless</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Securing Cross-Account AWS Operations: Adding External ID Support to CDK AwsCustomResource</title>
      <dc:creator>Michael Uanikehi</dc:creator>
      <pubDate>Thu, 20 Nov 2025 01:05:15 +0000</pubDate>
      <link>https://dev.to/aws-builders/securing-cross-account-aws-operations-adding-external-id-support-to-awscustomresource-l3p</link>
      <guid>https://dev.to/aws-builders/securing-cross-account-aws-operations-adding-external-id-support-to-awscustomresource-l3p</guid>
      <description>&lt;p&gt;I recently contributed to the AWS Cloud Development Kit (CDK) by implementing External ID support for AwsCustomResource, a feature that enhances security for cross-account AWS operations. The pull request &lt;a href="https://github.com/aws/aws-cdk/pull/35252" rel="noopener noreferrer"&gt;#35252&lt;/a&gt; was merged into the main branch after a comprehensive review process, addressing a critical security gap identified in issue &lt;a href="https://github.com/aws/aws-cdk/issues/34018" rel="noopener noreferrer"&gt;#34018&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This article walks through the problem, the solution, and the engineering decisions that went into this contribution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem: Confused Deputy Attacks&lt;/strong&gt;&lt;br&gt;
What is a Confused Deputy Attack?&lt;br&gt;
In multi-account AWS environments, services often need to assume roles across accounts. A "confused deputy" attack occurs when a malicious actor tricks a service (the "deputy") into performing unauthorized actions by exploiting the trust relationship between accounts.&lt;/p&gt;

&lt;p&gt;Consider this scenario:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your Lambda function assumes a role in Account B using sts:AssumeRole&lt;/li&gt;
&lt;li&gt;An attacker discovers your function's configuration&lt;/li&gt;
&lt;li&gt;The attacker creates their own resource that tricks your function into assuming a role in their account instead&lt;/li&gt;
&lt;li&gt;Your function unknowingly performs operations in the attacker's account&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Security Gap in AwsCustomResource&lt;br&gt;
Before this contribution, AwsCustomResource supported cross-account operations via assumedRoleArn but lacked support for External IDs—a critical AWS security best practice. This forced developers to choose between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Security: Skip cross-account functionality&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Functionality: Accept increased security risk&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Solution: External ID Support&lt;/strong&gt;&lt;br&gt;
What is an External ID?&lt;br&gt;
An External ID is a secret value that must be provided when assuming a role. It acts as a second factor of authentication, ensuring that only entities with both the correct ARN and the secret can assume the role.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Trust policy in the assumed role (Account B)
{
  "Effect": "Allow",
  "Principal": { "AWS": "arn:aws:iam::ACCOUNT-A:role/CustomResourceRole" },
  "Action": "sts:AssumeRole",
  "Condition": {
    "StringEquals": {
      "sts:ExternalId": "my-secret-external-id-12345"
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Implementation Overview&lt;br&gt;
The implementation spans three key areas:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;CDK Construct Interface (aws-custom-resource.ts)
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export interface AwsSdkCall {
  // ... existing properties

  /**
   * The external ID to use when assuming the role for this call.
   * 
   * An external ID is a secret identifier that you define and share with the
   * account owner. It helps prevent the "confused deputy" problem in cross-account
   * scenarios.
   *
   * @see https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html
   * @default - No external ID is used
   */
  readonly externalId?: string;

  /**
   * The ARN of the role to assume for this call.
   * 
   * When specified with externalId, both must be provided to the AssumeRole call.
   *
   * @default - No role is assumed (calls are made with the Lambda function's role)
   */
  readonly assumedRoleArn?: string;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Key design decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Optional property: Maintains backward compatibility&lt;/li&gt;
&lt;li&gt;Comprehensive documentation: Explains security benefits and links to AWS best practices&lt;/li&gt;
&lt;li&gt;Paired with assumedRoleArn: Only applies when cross-account operations are configured&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Lambda Handler (custom-resource-handlers/lib/custom-resources/utils.ts)
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;async function getCredentials(assumedRoleArn?: string, externalId?: string): Promise&amp;lt;AWS.Credentials | undefined&amp;gt; {
  if (!assumedRoleArn) {
    return undefined;
  }

  const sts = new AWS.STS();
  const timestamp = new Date().getTime();

  const params: AWS.STS.AssumeRoleRequest = {
    RoleArn: assumedRoleArn,
    RoleSessionName: `AwsSdkCall-${timestamp}`,
  };

  // Add External ID if provided
  if (externalId) {
    params.ExternalId = externalId;
  }

  const { Credentials: assumedCredentials } = await sts.assumeRole(params).promise();

  if (!assumedCredentials) {
    throw new Error('Failed to assume role');
  }

  return new AWS.Credentials({
    accessKeyId: assumedCredentials.AccessKeyId,
    secretAccessKey: assumedCredentials.SecretAccessKey,
    sessionToken: assumedCredentials.SessionToken,
  });
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The implementation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Passes External ID to STS AssumeRole calls when provided&lt;/li&gt;
&lt;li&gt;Maintains backward compatibility (works without External ID)&lt;/li&gt;
&lt;li&gt;Uses existing AWS SDK patterns for consistency&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Type Safety (construct-types.ts)
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export interface AwsSdkCall {
  service: string;
  action: string;
  parameters?: any;
  physicalResourceId?: PhysicalResourceId;
  assumedRoleArn?: string;
  externalId?: string;  // Added for type safety
  region?: string;
  apiVersion?: string;
  outputPaths?: string[];
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Ensures type consistency between the CDK construct and Lambda handler.&lt;/p&gt;

&lt;p&gt;Usage Examples&lt;br&gt;
Basic Cross-Account Operation with External ID&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { AwsCustomResource, AwsCustomResourcePolicy, PhysicalResourceId } from 'aws-cdk-lib/custom-resources';
import { PolicyStatement, Effect } from 'aws-cdk-lib/aws-iam';

const customResource = new AwsCustomResource(this, 'CrossAccountOperation', {
  onCreate: {
    service: 'S3',
    action: 'putObject',
    parameters: {
      Bucket: 'cross-account-bucket',
      Key: 'data.json',
      Body: JSON.stringify({ message: 'Hello from Account A' }),
    },
    physicalResourceId: PhysicalResourceId.of('cross-account-s3-object'),
    assumedRoleArn: 'arn:aws:iam::ACCOUNT-B:role/CrossAccountS3Role',
    externalId: 'my-secret-external-id-12345',  // Security enhancement!
  },
  policy: AwsCustomResourcePolicy.fromStatements([
    new PolicyStatement({
      effect: Effect.ALLOW,
      actions: ['sts:AssumeRole'],
      resources: ['arn:aws:iam::ACCOUNT-B:role/CrossAccountS3Role'],
    }),
  ]),
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Different External IDs per Operation&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const multiOpResource = new AwsCustomResource(this, 'MultiOpResource', {
  onCreate: {
    service: 'DynamoDB',
    action: 'putItem',
    parameters: { /* ... */ },
    assumedRoleArn: 'arn:aws:iam::ACCOUNT-B:role/DynamoDBRole',
    externalId: 'create-external-id-abc123',
  },
  onUpdate: {
    service: 'DynamoDB',
    action: 'updateItem',
    parameters: { /* ... */ },
    assumedRoleArn: 'arn:aws:iam::ACCOUNT-B:role/DynamoDBRole',
    externalId: 'update-external-id-xyz789',  // Different External ID
  },
  onDelete: {
    service: 'DynamoDB',
    action: 'deleteItem',
    parameters: { /* ... */ },
    assumedRoleArn: 'arn:aws:iam::ACCOUNT-B:role/DynamoDBRole',
    externalId: 'delete-external-id-def456',  // Different External ID
  },
  policy: AwsCustomResourcePolicy.fromStatements([
    new PolicyStatement({
      effect: Effect.ALLOW,
      actions: ['sts:AssumeRole'],
      resources: ['arn:aws:iam::ACCOUNT-B:role/DynamoDBRole'],
    }),
  ]),
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Validation and Testing&lt;/strong&gt;&lt;br&gt;
Comprehensive Test Coverage&lt;br&gt;
The contribution includes three levels of testing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Unit Tests (10 test cases)
External ID parameter propagation to CloudFormation
Different External IDs for different operations
Backward compatibility without External ID
Integration with existing assumedRoleArn
CloudFormation template validation
Edge cases and error scenarios
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;test('can specify external ID for cross-account operations', () =&amp;gt; {
  const stack = new Stack();

  new AwsCustomResource(stack, 'MyResource', {
    onCreate: {
      service: 'S3',
      action: 'listBuckets',
      assumedRoleArn: 'arn:aws:iam::123456789012:role/MyRole',
      externalId: 'my-external-id',
      physicalResourceId: PhysicalResourceId.of('list-buckets'),
    },
    policy: AwsCustomResourcePolicy.fromSdkCalls({ resources: ['*'] }),
  });

  Template.fromStack(stack).hasResourceProperties('Custom::AWS', {
    Create: {
      assumedRoleArn: 'arn:aws:iam::123456789012:role/MyRole',
      externalId: 'my-external-id',
    },
  });
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Integration Tests (4 scenarios)&lt;/li&gt;
&lt;li&gt;Real cross-account role assumption&lt;/li&gt;
&lt;li&gt;STS GetCallerIdentity validation&lt;/li&gt;
&lt;li&gt;CDK snapshot validation&lt;/li&gt;
&lt;li&gt;End-to-end workflow testing
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const role = new iam.Role(stack, 'Role', {
  assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'),
  externalIds: ['external-id-12345'],
});

const resource = new cr.AwsCustomResource(stack, 'GetCallerIdentity', {
  onCreate: {
    service: 'STS',
    action: 'getCallerIdentity',
    assumedRoleArn: role.roleArn,
    externalId: 'external-id-12345',
    physicalResourceId: cr.PhysicalResourceId.of('caller-identity'),
  },
  policy: cr.AwsCustomResourcePolicy.fromSdkCalls({ resources: ['*'] }),
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Lambda Handler Tests (7 test cases)&lt;/li&gt;
&lt;li&gt;getCredentials function correctly passes External ID&lt;/li&gt;
&lt;li&gt;STS AssumeRole includes ExternalId parameter&lt;/li&gt;
&lt;li&gt;Backward compatibility verification&lt;/li&gt;
&lt;li&gt;Type safety validation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Design Decisions and Alternatives&lt;/strong&gt;&lt;br&gt;
Why Optional External ID?&lt;br&gt;
Making externalId optional maintains backward compatibility. Existing users aren't forced to change their code, while new users can adopt the security best practice.&lt;/p&gt;

&lt;p&gt;Why Per-Operation External IDs?&lt;br&gt;
Different operations may require different security contexts. For example:&lt;/p&gt;

&lt;p&gt;Create: Stricter security with unique External ID&lt;br&gt;
Update: Different permissions, different External ID&lt;br&gt;
Delete: Potentially destructive, requires highest security&lt;br&gt;
This granularity provides maximum flexibility.&lt;/p&gt;

&lt;p&gt;Key Takeaways for Contributors&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Security-First Design&lt;br&gt;
Always consider security implications, especially for cross-account operations. Reference AWS documentation and best practices.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Backward Compatibility is Critical&lt;br&gt;
CDK is used by thousands of organizations. Any breaking change can affect production systems. Design features to be additive.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Comprehensive Testing&lt;br&gt;
Unit tests, integration tests, and manual validation ensure robustness. Don't skimp on test coverage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Documentation Matters&lt;br&gt;
Inline documentation, README updates, and practical examples help users understand and adopt new features safely.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Follow Existing Patterns&lt;br&gt;
Consistency with existing CDK patterns makes features more intuitive and maintainable.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Conclusion&lt;br&gt;
Adding External ID support to AwsCustomResource closes a critical security gap in AWS CDK's multi-account capabilities. The feature:&lt;/p&gt;

&lt;p&gt;Prevents confused deputy attacks&lt;br&gt;
Maintains full backward compatibility&lt;br&gt;
Follows AWS security best practices&lt;br&gt;
Provides flexible per-operation configuration&lt;br&gt;
Includes comprehensive testing and documentation&lt;/p&gt;

&lt;p&gt;This contribution demonstrates that security enhancements don't have to come at the cost of usability or backward compatibility. By carefully considering design decisions and thoroughly testing the implementation, we can deliver features that make AWS CDK both more powerful and more secure.&lt;/p&gt;

</description>
      <category>security</category>
      <category>awscdk</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Open Source as a Force Multiplier: How Small Fixes Scale Global Impact</title>
      <dc:creator>Michael Uanikehi</dc:creator>
      <pubDate>Sun, 19 Oct 2025 23:50:41 +0000</pubDate>
      <link>https://dev.to/oyiz-michael/open-source-as-a-force-multiplier-how-small-fixes-scale-global-impact-13fj</link>
      <guid>https://dev.to/oyiz-michael/open-source-as-a-force-multiplier-how-small-fixes-scale-global-impact-13fj</guid>
      <description>&lt;p&gt;Sometimes the things that change your career don’t start big.&lt;br&gt;
They start with a tiny pull request.&lt;/p&gt;

&lt;p&gt;For me, it was on AWS Lambda Powertools for Python, a library I’d used countless times in production.&lt;br&gt;
I wasn’t trying to do anything revolutionary. I just noticed that working with form data in serverless APIs was a little… clunky. So, I opened an issue, wrote a few lines of code, added some tests, and pushed a PR.&lt;/p&gt;

&lt;p&gt;That little change ended up helping thousands of developers build cleaner, better-documented AWS Lambda APIs.&lt;br&gt;
And that’s when it hit me &lt;em&gt;Open Source is a force multiplier&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How a “tiny fix” became a big deal&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you contribute to open source, you’re not just solving your own problem. You’re solving everyone’s who runs into that same wall after you.&lt;/p&gt;

&lt;p&gt;My contribution added OpenAPI form-data support basically, it let Lambda developers automatically document and validate form submissions without extra code. Nothing flashy. But it saved people time. It cleaned up their docs. It made their lives a little easier.&lt;/p&gt;

&lt;p&gt;That’s the magic of open source. You make a small move, and somehow, it ripples outward into something much bigger than you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I learned in the process&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The code itself wasn’t the hardest part- It was everything around it the reviews, the feedback loops, the back-and-forth with maintainers who genuinely wanted to help me improve it.&lt;/p&gt;

&lt;p&gt;Here’s what stood out to me:&lt;br&gt;
    1.  Small scope doesn’t mean small impact. Sometimes the best contributions aren’t new features they’re refinements that make everyone’s day smoother.&lt;br&gt;
    2.  Good maintainers are secret teachers. They don’t just merge code they help you understand why something should be done a certain way.&lt;br&gt;
    3.  Documentation and tests matter just as much as code. They make your contribution live longer than you do in the repo.&lt;br&gt;
    4.  Community is where confidence grows. Each PR teaches you a little more about how other engineers think, and that’s gold.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why open source still matters&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In a world where AI can generate a function in seconds, what still makes open source special isn’t the speed; &lt;em&gt;it’s the shared understanding.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When you contribute, you’re not just adding lines of code; you’re adding patterns, lessons, and empathy into the ecosystem.&lt;br&gt;
&lt;em&gt;That’s what scales! That’s what lasts!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So, where do you start?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you’ve been wanting to contribute but it feels intimidating, here’s my honest advice:&lt;br&gt;
Start small.&lt;/p&gt;

&lt;p&gt;You don’t need to build a new framework or rewrite an entire service. Sometimes, it’s about fixing something you bump into every day at work. &lt;br&gt;
Fix a typo, Improve a docstring, Write a test for something untested.&lt;br&gt;
The maintainers will thank you, and you’ll get the hang of the flow the issues, the reviews, the merge.&lt;/p&gt;

&lt;p&gt;Don’t wait for a “big idea.” Open source grows one small improvement at a time.&lt;/p&gt;

&lt;p&gt;That’s literally how another one of my contributions happened. &lt;br&gt;
I was working on an AWS project and noticed that the amazon-efs-utils tool wasn’t correctly handling the --region flag during cross-region mounts.&lt;br&gt;
It was a tiny thing, but it caused big headaches for anyone mounting EFS volumes across AWS regions.&lt;/p&gt;

&lt;p&gt;So I fixed it, opened a PR, and it got merged upstream.&lt;br&gt;
Now, that small fix helps every engineer who uses EFS in multi-region setups.&lt;/p&gt;

&lt;p&gt;That’s the beauty of open source you’re not just patching your own problem; you’re preventing hundreds of others from hitting the same wall.&lt;/p&gt;

&lt;p&gt;Fix the thing that slows you down today, and chances are, you’ll speed up someone else tomorrow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final thought&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I’ve worked in large systems, seen how much effort goes into building reliable cloud platforms but nothing compares to the shared power of open source.&lt;br&gt;
It’s the one place where a single developer can write 10 lines of code and quietly make life easier for 10,000 others.&lt;/p&gt;

&lt;p&gt;And that, to me, is the real definition of scale.&lt;/p&gt;

&lt;p&gt;If you’ve ever hesitated to make your first contribution, this is your sign to go for it!!!&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>community</category>
      <category>productivity</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Turn log lines into alerts (without building a whole observability stack)</title>
      <dc:creator>Michael Uanikehi</dc:creator>
      <pubDate>Sat, 13 Sep 2025 11:57:51 +0000</pubDate>
      <link>https://dev.to/aws-builders/turn-log-lines-into-alerts-without-building-a-whole-observability-stack-3ii</link>
      <guid>https://dev.to/aws-builders/turn-log-lines-into-alerts-without-building-a-whole-observability-stack-3ii</guid>
      <description>&lt;p&gt;Cold truth: problems always show up in logs first. The trick is turning those “uh-oh” lines into a nudge in your inbox before users feel it.&lt;/p&gt;

&lt;p&gt;Here’s the dead-simple pattern I use in AWS:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CloudWatch Logs → Metric Filter → Alarm → SNS (Email/Slack)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No new services to run. No extra agents. Just wiring.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this works&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Think of CloudWatch Logs as a river. Metric filters are little nets you drop in: “catch anything that looks like ERROR” or “grab JSON where level=ERROR and service=payments.” Each catch bumps a metric. Alarms watch that metric and boom; email, Slack, PagerDuty, whatever you like.&lt;/p&gt;

&lt;p&gt;Cheap. Fast. No app changes.&lt;/p&gt;

&lt;p&gt;App → CloudWatch Logs ──(metric filter)──▶ Metric&lt;br&gt;
                                   │&lt;br&gt;
                                   └──▶ Alarm ──▶ SNS ──▶ Email/Slack&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: create an SNS topic (so you get alerted)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws sns create-topic --name app-alarms
# copy the "TopicArn" from the output
TOPIC_ARN="arn:aws:sns:REGION:ACCOUNT_ID:app-alarms"

# subscribe your email (confirm the email to activate)
aws sns subscribe \
  --topic-arn "$TOPIC_ARN" \
  --protocol email \
  --notification-endpoint you@example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: add a metric filter to your log group&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Option A — simple keyword (“ERROR” but not health checks):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LOG_GROUP="/aws/lambda/my-fn"

aws logs put-metric-filter \
  --log-group-name "$LOG_GROUP" \
  --filter-name "ErrorCount" \
  --filter-pattern '"ERROR" -HealthCheck' \
  --metric-transformations \
      metricName=ErrorCount,metricNamespace="App/Alerts",metricValue=1,defaultValue=0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Option B — structured JSON logs (recommended):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws logs put-metric-filter \
  --log-group-name "$LOG_GROUP" \
  --filter-name "PaymentsErrors" \
  --filter-pattern '{ $.level = "ERROR" &amp;amp;&amp;amp; $.service = "payments" }' \
  --metric-transformations \
      metricName=PaymentsErrorCount,metricNamespace="App/Alerts",metricValue=1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3: create an alarm on that metric&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Alert if we see ≥ 1 error per minute for 3 minutes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws cloudwatch put-metric-alarm \
  --alarm-name "LambdaErrorBurst" \
  --metric-name ErrorCount \
  --namespace "App/Alerts" \
  --statistic Sum \
  --period 60 \
  --evaluation-periods 3 \
  --threshold 1 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --treat-missing-data notBreaching \
  --alarm-actions "$TOPIC_ARN" \
  --ok-actions "$TOPIC_ARN"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That treat-missing-data=notBreaching bit keeps you from getting “we’re fine!” alerts when traffic is quiet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: test it (don’t skip this)&lt;/strong&gt;&lt;br&gt;
    1.  Log an ERROR that matches your filter.&lt;br&gt;
    2.  In CloudWatch Metrics → App/Alerts, make sure the metric ticks up.&lt;br&gt;
    3.  Watch the alarm flip to ALARM and check your email.&lt;/p&gt;

&lt;p&gt;If nothing happens, go to your Log Group → Metric filters → Test pattern and paste a real log line. It’ll tell you if your pattern matches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prefer Terraform? here’s the whole thing&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_sns_topic" "app_alarms" {
  name = "app-alarms"
}

resource "aws_sns_topic_subscription" "email" {
  topic_arn = aws_sns_topic.app_alarms.arn
  protocol  = "email"
  endpoint  = "you@example.com"
}

resource "aws_cloudwatch_log_metric_filter" "errors" {
  name           = "ErrorCount"
  log_group_name = "/aws/lambda/my-fn"
  pattern        = "\"ERROR\" -HealthCheck"

  metric_transformation {
    name          = "ErrorCount"
    namespace     = "App/Alerts"
    value         = "1"
    default_value = "0"
  }
}

resource "aws_cloudwatch_metric_alarm" "error_alarm" {
  alarm_name          = "LambdaErrorBurst"
  namespace           = "App/Alerts"
  metric_name         = "ErrorCount"
  statistic           = "Sum"
  period              = 60
  evaluation_periods  = 3
  threshold           = 1
  comparison_operator = "GreaterThanOrEqualToThreshold"
  treat_missing_data  = "notBreaching"
  alarm_actions       = [aws_sns_topic.app_alarms.arn]
  ok_actions          = [aws_sns_topic.app_alarms.arn]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Common gotchas&lt;/strong&gt; (learned the hard way)&lt;br&gt;
    • Case matters. ERROR ≠ error. Match what you actually log.&lt;br&gt;
    • Per line matching. Filters look at one line at a time. If your stack trace spans lines, rely on a JSON level field instead.&lt;br&gt;
    • Right account/region. Metric filters must live with the log group.&lt;br&gt;
    • Don’t explode cardinality. Keep one metric per signal; don’t bake IDs into metric names.&lt;br&gt;
    • No alerts during quiet times. That treat missing data setting is your chill pill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Variations&lt;/strong&gt; you’ll probably want&lt;br&gt;
    • Slack/Teams: SNS → lambda → Slack (point &amp;amp; click).&lt;br&gt;
    • PagerDuty/Opsgenie: SNS → EventBridge → your incident tool.&lt;br&gt;
    • Smarter thresholds: Try Anomaly Detection alarms once you have baseline traffic.&lt;br&gt;
    • Composite alarms: “Only alert if errors spike and p50 latency is ugly.”&lt;/p&gt;

&lt;p&gt;You don’t need a massive observability rebuild to get useful alerts. Start with one or two high signal patterns timeouts, 5xx, “payment failed” wire them to email, and iterate.&lt;/p&gt;

&lt;p&gt;Tiny effort. Big safety net.&lt;/p&gt;

</description>
      <category>cloudwatch</category>
      <category>observability</category>
      <category>monitoring</category>
      <category>aws</category>
    </item>
    <item>
      <title>Killing cold starts with Lambda SnapStart</title>
      <dc:creator>Michael Uanikehi</dc:creator>
      <pubDate>Thu, 14 Aug 2025 20:39:04 +0000</pubDate>
      <link>https://dev.to/aws-builders/killing-cold-starts-with-lambda-snapstart-1h77</link>
      <guid>https://dev.to/aws-builders/killing-cold-starts-with-lambda-snapstart-1h77</guid>
      <description>&lt;p&gt;Serverless is amazing for scale, but cold starts are the silent tax we pay. In this post, we’ll unpack AWS Lambda SnapStart, how it works, what limitations matter, and how to enable it across Terraform, SAM, and CDK to slash your cold starts.&lt;/p&gt;

&lt;h2&gt;
  
  
  What SnapStart actually does:
&lt;/h2&gt;

&lt;p&gt;When you publish a version of your Lambda, SnapStart:&lt;br&gt;
    1.  Initializes your function once (imports, SDK clients, DB pools, frameworks, etc.).&lt;br&gt;
    2.  Takes a snapshot of memory + runtime state after init.&lt;br&gt;
    3.  Caches that snapshot.&lt;br&gt;
    4.  On new execution environments, restores from that snapshot instead of doing INIT again.&lt;/p&gt;

&lt;p&gt;Result: far less startup time, often ~10× faster for heavy-initialization workloads. You’ll see a new “Restore Duration” in logs and X-Ray that reflects snapshot restore time.  ￼ ￼&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Supported runtimes &amp;amp; key limitations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Runtimes (managed):&lt;br&gt;
    • Java 11+, Python 3.12+, .NET 8+. Not supported for Node.js, Ruby, OS-only runtimes, or container images.  ￼&lt;/p&gt;

&lt;p&gt;Limits &amp;amp; incompatibilities:&lt;br&gt;
    • Not compatible with Provisioned Concurrency (choose one).  ￼&lt;br&gt;
    • No EFS, and /tmp must be ≤ 512 MB.  ￼&lt;br&gt;
    • Works on published versions (and aliases pointing to them) — not $LATEST.  ￼&lt;br&gt;
    • Pricing: charged for cache time (while version is active) and per restore; cost scales with memory size.  ￼&lt;/p&gt;

&lt;p&gt;Lifecycle nuance:&lt;br&gt;
    • Snapshots for Python/.NET stay active as long as the version is active.&lt;br&gt;
    • For Java, a snapshot may expire after 14 days of inactivity.  ￼&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When SnapStart shines&lt;/strong&gt;&lt;br&gt;
    • Heavy init: big frameworks (Spring, .NET DI), large dependency graphs, JDBC/SDK client setup.&lt;br&gt;
    • Spiky or unpredictable traffic: where paying for Provisioned Concurrency would be wasteful.&lt;br&gt;
    • Latency-sensitive paths that still tolerate low-double-digit ms on restore.&lt;/p&gt;

&lt;p&gt;When SnapStart is not the right tool:&lt;br&gt;
    • You must use EFS, &amp;gt;512MB ephemeral storage, or Provisioned Concurrency.&lt;br&gt;
    • Your function depends on unique state during INIT (see “uniqueness” below).  ￼&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enabling SnapStart with IaC&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Terraform&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;resource "aws_lambda_function" "fn" {
  function_name = "snapstart-demo"
  runtime       = "python3.12" # or java11/java17/.NET 8+
  handler       = "app.handler"
  role          = aws_iam_role.lambda.arn
  filename      = "build.zip"

  # SnapStart only works on published versions — publish must be true
  publish = true

  snap_start {
    apply_on = "PublishedVersions"
  }
}

resource "aws_lambda_alias" "live" {
  name             = "live"
  function_name    = aws_lambda_function.fn.function_name
  function_version = aws_lambda_function.fn.version
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Terraform requires publish = true and snap_start.apply_on = "PublishedVersions". Use an alias for safe releases.  ￼&lt;/p&gt;

&lt;p&gt;AWS SAM (template.yaml)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Resources:
  SnapStartFn:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: python3.12
      Handler: app.handler
      CodeUri: .
      AutoPublishAlias: live
      SnapStart:
        ApplyOn: PublishedVersions

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SnapStart appears under Edit &amp;gt; Basic settings in console, but with SAM you set it declaratively and always publish a version + alias.  ￼&lt;/p&gt;

&lt;p&gt;AWS CDK (TypeScript)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const fn = new lambda.Function(this, 'Fn', {
  runtime: lambda.Runtime.PYTHON_3_12,
  code: lambda.Code.fromAsset('dist'),
  handler: 'app.handler',
  snapStart: { applyOn: lambda.SnapStartApplyOn.PUBLISHED_VERSIONS },
  currentVersionOptions: { removalPolicy: cdk.RemovalPolicy.DESTROY }
});

new lambda.Alias(this, 'Live', { aliasName: 'live', version: fn.currentVersion });

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Measuring the impact&lt;/strong&gt;&lt;br&gt;
    • CloudWatch Logs: Each cold start shows Restore Duration and Billed Restore Duration in the REPORT line. Track these to quantify improvements.  ￼&lt;br&gt;
    • AWS X-Ray: Enable tracing to see a Restore subsegment alongside invocation. Great for comparing before/after.  ￼ ￼&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best practices &amp;amp; “uniqueness” gotchas&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because multiple execution environments start from the same snapshot, don’t create per-invoke unique state during init. Move anything that must be unique into the handler (or use after-restore hooks if your runtime supports them). Watch for:&lt;br&gt;
    • Random/UUID seeded at init → generate inside the handler.&lt;br&gt;
    • Ephemeral tokens/leases → fetch per request.&lt;br&gt;
    • Time-based logic in init → recompute on first invoke (or use after-restore).&lt;br&gt;
See AWS docs on uniqueness and runtime hooks if you rely on special init behavior.  ￼&lt;/p&gt;

&lt;p&gt;What’s safe in INIT?&lt;br&gt;
    • SDK clients / DB pools (connection creation may still occur lazily).&lt;br&gt;
    • Static configuration (env vars, constants).&lt;br&gt;
    • Framework boot (Spring/.NET DI, serializers).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost awareness&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You pay for:&lt;br&gt;
    • Snapshot cache time while the version is active (per-ms, minimum 3 hours).&lt;br&gt;
    • Snapshot restore GB per resume.&lt;br&gt;
Use smaller memory sizes where possible and clean up old versions to avoid lingering cache cost.  ￼ ￼ ￼&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world tuning checklist&lt;/strong&gt;&lt;br&gt;
    • Publish versions + use aliases (live, beta) so each release creates a fresh snapshot.  ￼&lt;br&gt;
    • Warm critical paths: A canary/health check can invoke the alias after deploy to “prime” the version state.&lt;br&gt;
    • Log &amp;amp; trace: Compare cold start vs. restore using Restore Duration and X-Ray.  ￼&lt;br&gt;
    • Right-size memory: Faster CPU → faster restore, but balance against cache/restore costs.&lt;br&gt;
    • CI/CD: Account for snapshot creation time when publishing (deploy steps may take longer).  ￼&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick examples by runtime&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Python 3.12&lt;/p&gt;

&lt;p&gt;app.py&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3
# OK to create clients in init – captured in snapshot
s3 = boto3.client("s3")

def handler(event, context):
    # Generate per-invoke UUIDs/timestamps inside the handler, not at import time
    # ...your logic...
    return {"ok": True}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Java (11+/17+/21)&lt;br&gt;
    • Favor GraalVM native image (where possible) or tune your framework bootstrap (Spring AOT, Micronaut, Quarkus).&lt;br&gt;
    • Avoid init-time randomness; move uniqueness to the handler.&lt;br&gt;
    • Enable X-Ray to visualize Restore vs Invocation.&lt;/p&gt;

&lt;p&gt;.NET 8&lt;br&gt;
    • Heavy DI? Great SnapStart candidate.&lt;br&gt;
    • Ensure libraries don’t assume “fresh process” uniqueness during initialization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SnapStart vs Provisioned Concurrency (PC)&lt;/strong&gt;&lt;br&gt;
    • SnapStart: Pay per snapshot cache/restore; great for many bursty workloads; no EFS/PC.&lt;br&gt;
    • PC: Always-warm environments; extra charges constantly; works with EFS; deterministic lowest-latency.&lt;br&gt;
You can’t enable both on the same version—pick the one that fits your constraints.  ￼&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;If you’re running Java, Python 3.12+, or .NET 8+, SnapStart is a no-brainer to reduce latency without the always-on bill of Provisioned Concurrency. Enable it once, measure your gains, and enjoy faster cold starts at scale.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>lambda</category>
      <category>serverless</category>
    </item>
    <item>
      <title>How to Handle Form Data in AWS Lambda APIs with Powertools OpenAPI Support</title>
      <dc:creator>Michael Uanikehi</dc:creator>
      <pubDate>Wed, 06 Aug 2025 18:04:07 +0000</pubDate>
      <link>https://dev.to/aws-builders/how-to-handle-form-data-in-aws-lambda-apis-with-powertools-openapi-support-4l40</link>
      <guid>https://dev.to/aws-builders/how-to-handle-form-data-in-aws-lambda-apis-with-powertools-openapi-support-4l40</guid>
      <description>&lt;p&gt;A complete guide to using the new Form parameter support in AWS Lambda Powertools for Python&lt;/p&gt;

&lt;p&gt;AWS Lambda Powertools for Python now supports form data parameters in OpenAPI schema generation! This means you can build Lambda APIs that accept application/x-www-form-urlencoded data with automatic validation and documentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why This Matters&lt;/strong&gt;&lt;br&gt;
Before this feature, Lambda APIs built with Powertools could only generate proper OpenAPI schemas for JSON payloads. If you needed to handle form data (like HTML forms or certain client applications), you had to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manually parse form data from the raw request body&lt;/li&gt;
&lt;li&gt;Write custom validation logic&lt;/li&gt;
&lt;li&gt;Maintain separate API documentation&lt;/li&gt;
&lt;li&gt;Handle errors without proper validation feedback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now you can handle form data declaratively with automatic validation, error handling, and OpenAPI documentation generation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Getting Started&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Installation&lt;br&gt;
Make sure you have the latest version of AWS Lambda Powertools:&lt;br&gt;
&lt;code&gt;pip install aws-lambda-powertools[validation]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Basic Form Handling&lt;br&gt;
Here's how to create a Lambda function that accepts form data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from typing import Annotated
from aws_lambda_powertools import Logger
from aws_lambda_powertools.event_handler import APIGatewayRestResolver
from aws_lambda_powertools.event_handler.openapi.params import Form

logger = Logger()
app = APIGatewayRestResolver(enable_validation=True)

@app.post("/contact")
def submit_contact_form(
    name: Annotated[str, Form(description="Contact's full name")],
    email: Annotated[str, Form(description="Contact's email address")],
    message: Annotated[str, Form(description="Contact message")]
):
    """Handle contact form submission."""
    logger.info("Processing contact form", extra={
        "name": name,
        "email": email
    })

    # Process the form data
    # (save to database, send email, etc.)

    return {
        "message": "Contact form submitted successfully",
        "contact_id": "12345"
    }

def lambda_handler(event, context):
    return app.resolve(event, context)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What You Get Automatically&lt;br&gt;
With this simple setup, Powertools automatically provides:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Request Validation: Invalid form data returns proper 422 errors&lt;/li&gt;
&lt;li&gt;OpenAPI Schema: Generated schema shows form fields and types&lt;/li&gt;
&lt;li&gt;Error Handling: Detailed validation errors for debugging&lt;/li&gt;
&lt;li&gt;Content-Type Detection: Automatically parses application/x-www-form-urlencoded&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Real-World Examples&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;User Registration Form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from typing import Annotated, Optional
from aws_lambda_powertools.event_handler.openapi.params import Form
from pydantic import EmailStr

@app.post("/register")
def register_user(
    username: Annotated[str, Form(min_length=3, max_length=20)],
    email: Annotated[EmailStr, Form()],
    password: Annotated[str, Form(min_length=8)],
    newsletter: Annotated[bool, Form()] = False,
    referral_code: Annotated[Optional[str], Form()] = None
):
    """User registration endpoint."""

    # Automatic validation ensures:
    # - username is 3-20 characters
    # - email is valid format
    # - password is at least 8 characters
    # - newsletter is boolean
    # - referral_code is optional

    return {"user_id": "user_123", "status": "registered"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Survey/Feedback Form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from typing import List
from enum import Enum

class SatisfactionLevel(str, Enum):
    VERY_SATISFIED = "very_satisfied"
    SATISFIED = "satisfied" 
    NEUTRAL = "neutral"
    DISSATISFIED = "dissatisfied"
    VERY_DISSATISFIED = "very_dissatisfied"

@app.post("/feedback")
def submit_feedback(
    overall_satisfaction: Annotated[SatisfactionLevel, Form()],
    product_rating: Annotated[int, Form(ge=1, le=5, description="Rating from 1-5")],
    comments: Annotated[str, Form(max_length=1000)],
    recommend: Annotated[bool, Form(description="Would you recommend us?")],
    improvements: Annotated[Optional[str], Form()] = None
):
    """Process customer feedback survey."""

    return {
        "feedback_id": "fb_456",
        "thank_you_message": "Thank you for your feedback!"
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Advanced Features&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Custom Validation&lt;br&gt;
You can add custom validation using Pydantic validators:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from pydantic import field_validator

class ContactRequest(BaseModel):
    name: Annotated[str, Form()]
    email: Annotated[str, Form()]
    phone: Annotated[str, Form()]

    @field_validator('phone')
    def validate_phone(cls, v):
        # Custom phone validation logic
        if not re.match(r'^\+?[\d\s-()]+$', v):
            raise ValueError('Invalid phone number format')
        return v

@app.post("/contact-advanced")
def advanced_contact(contact: ContactRequest):
    return {"status": "received", "contact_id": "contact_789"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Error Handling&lt;br&gt;
Form validation errors are automatically formatted:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# When invalid data is sent, you get detailed error responses:
{
    "detail": [
        {
            "type": "string_too_short",
            "loc": ["body", "username"],
            "msg": "String should have at least 3 characters",
            "input": "ab"
        },
        {
            "type": "value_error",
            "loc": ["body", "email"],
            "msg": "Invalid email format"
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Testing Your Form Endpoints&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Using curl:&lt;br&gt;
&lt;code&gt;# Test the contact form&lt;br&gt;
curl -X POST https://your-api.execute-api.region.amazonaws.com/contact \&lt;br&gt;
  -H "Content-Type: application/x-www-form-urlencoded" \&lt;br&gt;
  -d "name=John Doe&amp;amp;email=john@example.com&amp;amp;message=Hello World"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Using Python requests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import requests

response = requests.post(
    'https://your-api.execute-api.region.amazonaws.com/contact',
    data={
        'name': 'Jane Smith',
        'email': 'jane@example.com', 
        'message': 'Great service!'
    }
)

print(response.json())
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;HTML Form:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;form action="https://your-api.execute-api.region.amazonaws.com/contact" method="POST"&amp;gt;
    &amp;lt;input type="text" name="name" required&amp;gt;
    &amp;lt;input type="email" name="email" required&amp;gt;
    &amp;lt;textarea name="message" required&amp;gt;&amp;lt;/textarea&amp;gt;
    &amp;lt;button type="submit"&amp;gt;Send Message&amp;lt;/button&amp;gt;
&amp;lt;/form&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;OpenAPI Documentation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your form endpoints automatically generate proper OpenAPI documentation. Here's what the generated schema looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;paths:
  /contact:
    post:
      requestBody:
        required: true
        content:
          application/x-www-form-urlencoded:
            schema:
              type: object
              properties:
                name:
                  type: string
                  description: "Contact's full name"
                email:
                  type: string
                  description: "Contact's email address"
                message:
                  type: string
                  description: "Contact message"
              required: ["name", "email", "message"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Integration with Swagger UI&lt;/strong&gt;&lt;br&gt;
Enable Swagger UI to get an interactive API explorer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;app = APIGatewayRestResolver(enable_validation=True)
app.enable_swagger(
    path="/docs",
    title="My Contact API",
    version="1.0.0"
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Deployment Best Practices&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Environment Configuration&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
from aws_lambda_powertools import Logger, Tracer, Metrics

logger = Logger()
tracer = Tracer()
metrics = Metrics()

# Environment-specific settings
DEBUG = os.getenv('DEBUG', 'false').lower() == 'true'
MAX_MESSAGE_LENGTH = int(os.getenv('MAX_MESSAGE_LENGTH', '1000'))

app = APIGatewayRestResolver(
    enable_validation=True,
    debug=DEBUG
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SAM Template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  ContactFormFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: python3.11
      Environment:
        Variables:
          POWERTOOLS_SERVICE_NAME: contact-api
          MAX_MESSAGE_LENGTH: 2000
      Events:
        ContactForm:
          Type: Api
          Properties:
            Path: /contact
            Method: post
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Monitoring and Observability&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Powertools automatically provides structured logging:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@app.post("/contact")
@tracer.capture_method
def submit_contact_form(name: Annotated[str, Form()], email: Annotated[str, Form()]):

    # Add custom metrics
    metrics.add_metric(name="ContactFormSubmission", unit="Count", value=1)
    metrics.add_metadata(key="email_domain", value=email.split('@')[1])

    # Structured logging
    logger.info("Contact form submitted", extra={
        "name": name,
        "email_domain": email.split('@')[1]
    })

    return {"status": "success"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Security Considerations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Input Sanitization&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import html

@app.post("/comment")
def submit_comment(
    content: Annotated[str, Form(max_length=500)]
):
    # Sanitize HTML content
    clean_content = html.escape(content)

    # Additional sanitization as needed
    return {"comment_id": "comment_123"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rate Limiting&lt;br&gt;
Use AWS API Gateway throttling or implement custom rate limiting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from functools import wraps
import time

def rate_limit(max_requests_per_minute=10):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            # Implement rate limiting logic
            return func(*args, **kwargs)
        return wrapper
    return decorator

@app.post("/contact")
@rate_limit(max_requests_per_minute=5)
def submit_contact_form(name: Annotated[str, Form()]):
    return {"status": "received"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Performance Tips&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use appropriate field constraints to validate data early&lt;/li&gt;
&lt;li&gt;Enable response compression for large form responses&lt;/li&gt;
&lt;li&gt;Implement caching for expensive validation operations&lt;/li&gt;
&lt;li&gt;Use async patterns for external API calls during form processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additional Resources:&lt;br&gt;
AWS Lambda Powertools Documentation: &lt;a href="https://docs.powertools.aws.dev/lambda/python/latest/" rel="noopener noreferrer"&gt;https://docs.powertools.aws.dev/lambda/python/latest/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Example Applications Repository: &lt;a href="https://github.com/aws-powertools/powertools-lambda-python/tree/develop/examples" rel="noopener noreferrer"&gt;https://github.com/aws-powertools/powertools-lambda-python/tree/develop/examples&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ready to build better form-handling Lambda APIs? Try out the new Form parameter support and let me know how it works for your use cases!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>lambda</category>
      <category>openai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Mounting Amazon EFS Across 3 Regions (Kubernetes + EC2): Work-arounds, Tweaks &amp; Startup Automation</title>
      <dc:creator>Michael Uanikehi</dc:creator>
      <pubDate>Tue, 22 Jul 2025 22:11:40 +0000</pubDate>
      <link>https://dev.to/aws-builders/mounting-amazon-efs-across-3-regions-kubernetes-ec2-work-arounds-tweaks-startup-automation-35j8</link>
      <guid>https://dev.to/aws-builders/mounting-amazon-efs-across-3-regions-kubernetes-ec2-work-arounds-tweaks-startup-automation-35j8</guid>
      <description>&lt;p&gt;Mounting Amazon EFS across multiple AWS regions is not something you do every day but when you need to, the pain becomes real. In this article, I’ll walk through how I achieved cross-region EFS mounting from three AWS regions into Kubernetes (EKS) and EC2-based deployment. We’ll cover the architecture, common pitfalls, and practical work-arounds for both environments.&lt;/p&gt;

&lt;p&gt;EFS DNS is regional. Each mount helper expects the region-specific hostname (e.g., fs-1234.efs.us-east-1.amazonaws.com). When you point that hostname at a mount target in a different region, the helper often fails especially inside Kubernetes because it does a DNS check and won’t trust /etc/hosts overrides.&lt;/p&gt;

&lt;p&gt;Why Cross-Region EFS Mounting?&lt;/p&gt;

&lt;p&gt;We manage workloads that span multiple AWS regions to support high availability and global financial clients. These workloads rely on shared file systems, and Amazon EFS was our go-to choice. However, EFS is not designed for seamless cross-region mounting. We needed a way to mount:&lt;/p&gt;

&lt;p&gt;This was technically possible but full of edge cases.&lt;/p&gt;

&lt;p&gt;The Architecture:&lt;br&gt;
-EFS volumes in three regions (us-east-1&lt;code&gt;,&lt;/code&gt;eu-west-1&lt;code&gt;, and&lt;/code&gt;ap-southeast-1)&lt;br&gt;
-EKS cluster and EC2 in &lt;code&gt;us-east-1&lt;/code&gt;&lt;br&gt;
-VPC peering between regions (NFS port 2049 open)&lt;br&gt;
-Mount helper: &lt;code&gt;amazon-efs-utils&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Common Problems&lt;/p&gt;

&lt;p&gt;&lt;code&gt;amazon-efs-utils&lt;/code&gt; requires AWS-provided DNS names, not custom hostnames or CNAMEs&lt;br&gt;
When used inside Kubernetes, DNS resolution often fails or defaults to &lt;code&gt;127.0.0.1&lt;/code&gt;&lt;br&gt;
Even when using &lt;code&gt;hostAliases&lt;/code&gt; in pods, the mount helper doesn’t always respect it&lt;br&gt;
IAM role mismatch between pod and node leads to permission errors&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kubernetes (EKS) Approach&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Key Steps&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install &lt;code&gt;amazon-efs-utils&lt;/code&gt; in an init container (or bake it into the image).&lt;/li&gt;
&lt;li&gt;Resolve the mount-target IPs for each region.&lt;/li&gt;
&lt;li&gt;Add IP/hostname pairs to &lt;code&gt;/etc/hosts&lt;/code&gt; via &lt;code&gt;hostAliases&lt;/code&gt; or an init container.&lt;/li&gt;
&lt;li&gt;Mount with &lt;code&gt;tls,iam,region=&amp;lt;SOURCE_REGION&amp;gt;&lt;/code&gt; options.
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_REGION&lt;/span&gt;
    &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1"&lt;/span&gt;
&lt;span class="na"&gt;hostAliases&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;ip&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10.94.117.128"&lt;/span&gt;
    &lt;span class="na"&gt;hostnames&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{efs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;id}.efs.us-east-1.amazonaws.com"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;ip&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10.94.125.126"&lt;/span&gt;
    &lt;span class="na"&gt;hostnames&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{efs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;id}.efs.ap-southeast-1.amazonaws.com"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;ip&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10.94.109.68"&lt;/span&gt;
    &lt;span class="na"&gt;hostnames&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{efs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;id}.efs.eu-west-1.amazonaws.com"&lt;/span&gt;
&lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="s"&gt;yum install -y amazon-efs-utils&lt;/span&gt;
&lt;span class="c1"&gt;# Mount each EFS&lt;/span&gt;
&lt;span class="s"&gt;mount -t efs -o tls,iam,region=us-east-1 ${EFS_ID}:/ /mnt/efs-east&lt;/span&gt;
&lt;span class="s"&gt;mount -t efs -o tls,iam,region=ap-southeast-1 ${EFS_ID}:/ /mnt/efs-ap&lt;/span&gt;
&lt;span class="s"&gt;mount -t efs -o tls,iam,region=eu-west-1 ${EFS_ID}:/ /mnt/efs-eu&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Challenges:&lt;br&gt;
Mount helper may still ignore &lt;code&gt;/etc/hosts&lt;/code&gt;&lt;br&gt;
Pod IAM must allow: &lt;code&gt;elasticfilesystem:ClientMount&lt;/code&gt; + &lt;code&gt;ClientWrite&lt;/code&gt;&lt;br&gt;
You can only mount if IP is reachable from the current AZ/subnet&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;EC2 Approach&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Much easier than EKS thanks to direct &lt;code&gt;/etc/hosts&lt;/code&gt; control.&lt;br&gt;
User Data Script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
yum &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; amazon-efs-utils

&lt;span class="c"&gt;# Resolve EFS hostnames manually&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"10.00.111.100 &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;EFS_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.efs.us-east-1.amazonaws.com"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /etc/hosts
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"10.00.111.101 &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;EFS_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.efs.ap-southeast-1.amazonaws.com"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /etc/hosts
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"10.00.111.102 &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;EFS_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.efs.eu-west-1.amazonaws.com"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /etc/hosts

&lt;span class="c"&gt;# Create mount points&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /mnt/efs-east /mnt/efs-ap /mnt/efs-eu

&lt;span class="c"&gt;# Mount each EFS&lt;/span&gt;
mount &lt;span class="nt"&gt;-t&lt;/span&gt; efs &lt;span class="nt"&gt;-o&lt;/span&gt; tls,iam,region&lt;span class="o"&gt;=&lt;/span&gt;us-east-1 &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;EFS_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:/ /mnt/efs-east
mount &lt;span class="nt"&gt;-t&lt;/span&gt; efs &lt;span class="nt"&gt;-o&lt;/span&gt; tls,iam,region&lt;span class="o"&gt;=&lt;/span&gt;ap-southeast-1 &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;EFS_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:/ /mnt/efs-ap
mount &lt;span class="nt"&gt;-t&lt;/span&gt; efs &lt;span class="nt"&gt;-o&lt;/span&gt; tls,iam,region&lt;span class="o"&gt;=&lt;/span&gt;eu-west-1 &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;EFS_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:/ /mnt/efs-eu
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lessons Learned&lt;/p&gt;

&lt;p&gt;Only &lt;strong&gt;one mount target IP&lt;/strong&gt; per hostname works&lt;br&gt;
If the IP is unreachable, &lt;strong&gt;mount fails completely&lt;/strong&gt;, even if DNS resolves&lt;br&gt;
IAM must be correct on &lt;strong&gt;every node or pod&lt;/strong&gt;&lt;br&gt;
EKS needs extra care due to kube-dns + IAM&lt;/p&gt;

&lt;p&gt;Recommendations&lt;/p&gt;

&lt;p&gt;Use EC2 where possible if reliability matters&lt;br&gt;
In EKS, use &lt;code&gt;hostAliases&lt;/code&gt; but always test per AZ&lt;br&gt;
Consider building a helper script or sidecar to handle resolution dynamically&lt;br&gt;
Use region-specific mount options (e.g. &lt;code&gt;-o tls,iam,region=eu-west-1&lt;/code&gt;)&lt;br&gt;
Stay close to AWS guidance &lt;a href="https://docs.aws.amazon.com/efs/latest/ug/mounting-fs-mount-helper-efs-utils.html" rel="noopener noreferrer"&gt;like this one&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bonus: Automate It&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can automate:&lt;/p&gt;

&lt;p&gt;IP resolution via boto3&lt;br&gt;
IAM role patching&lt;br&gt;
Host file injection via DaemonSet&lt;br&gt;
Mount validation in init containers&lt;/p&gt;

&lt;p&gt;Final Thoughts&lt;/p&gt;

&lt;p&gt;Cross-region EFS mounting does work — but it’s fragile. Knowing how DNS, IPs, IAM, and Linux internals interact is key to making it reliable. If you’ve ever fought &lt;code&gt;127.0.0.1&lt;/code&gt; DNS resolution in a pod, or had a mount fail mysteriously in one AZ but not another, this article is for you.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>infrastructure</category>
      <category>awsefs</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
