DEV Community: Michael Uanikehi

Handling File Uploads in AWS Lambda with Powertools OpenAPI (From Limitation to Production Feature)

Michael Uanikehi — Mon, 06 Apr 2026 18:13:46 +0000

Introduction

Handling file uploads in serverless APIs sounds simple until you actually try to do it.

If you're building APIs with AWS Lambda Powertools and OpenAPI validation, you quickly run into a limitation:

multipart/form-data isn’t natively supported in the same way as JSON or form-encoded requests.

That gap forces teams into workarounds:

Manual multipart parsing
Base64 hacks
Disabling validation entirely

None of which are ideal in production systems.

This article walks through:

The real problem
How the feature was designed
How you can now use it in practice

The Problem: File Uploads Break the Abstraction

Before this feature, Powertools handled:

JSON payloads
Query parameters
Headers
Form data (application/x-www-form-urlencoded)

But not:

multipart/form-data (file uploads)

That meant:

@app.post("/upload")
def upload(file: bytes):
    ...

simply didn’t work with OpenAPI validation.

Instead, developers had to:

Parse raw request bodies manually
Disable validation middleware
Or redesign APIs around non-standard formats

At scale, this creates:

Inconsistent APIs
Security gaps
Poor developer experience

The Goal: Make File Uploads First-Class

The aim was simple:

Make file uploads work the same way as Query(), Header(), and Form()

That means:

Type-safe
Automatically validated
Fully reflected in OpenAPI schema
Works with Swagger UI

The Solution: `File()` Parameter Support

You can now define file inputs like this:

from typing import Annotated
from aws_lambda_powertools.event_handler import APIGatewayRestResolver
from aws_lambda_powertools.event_handler.openapi.params import File, UploadFile

app = APIGatewayRestResolver(enable_validation=True)
app.enable_swagger(path="/swagger")

@app.post("/upload")
def upload(file_data: Annotated[UploadFile, File(description="File to upload")]):
    return {
        "filename": file_data.filename,
        "content_type": file_data.content_type,
        "file_size": len(file_data),
    }

Two Ways to Work with Files

1. Raw bytes

file: Annotated[bytes, File()]

You get:

File content only

2. Rich file object

file_data: Annotated[UploadFile, File()]

You get:

Content
Filename
Content type

This is usually what you want in real systems.

Combining Files with Form Data

@app.post("/upload-csv")
def upload_csv(
    file_data: Annotated[UploadFile, File(description="CSV file")],
    separator: Annotated[str, Form(description="CSV separator")] = ",",
):
    text = file_data.content.decode("utf-8")

This unlocks:

Metadata + file uploads
Real-world API patterns

What Changed Under the Hood

Supporting this wasn’t just adding a new parameter type.

It required:

Multipart parsing logic
Boundary handling (including WebKit quirks)
Base64 decoding for Lambda event payloads
Differentiating file vs form fields
OpenAPI schema generation (format: binary)
Validation integration

There’s also a helpful runtime safeguard:

If multipart requests aren’t properly base64 encoded, a warning is emitted

This helps catch common misconfigurations early.

API Gateway Gotcha (Important)

If you're using REST API (v1), you must configure:

Globals:
  Api:
    BinaryMediaTypes:
      - "multipart~1form-data"

Without this:
File uploads won’t work correctly.

For:

HTTP API (v2)
Lambda Function URLs
ALB

It works out of the box.

Before vs After

Before

Manual parsing
No validation
Custom schemas
Inconsistent APIs

After

Native File() support
OpenAPI validation
Swagger UI integration
Cleaner, safer APIs

Why This Matters

This isn’t just about file uploads.

It’s about removing friction from real-world API design.

When basic capabilities are missing:

Engineers build workarounds
Systems become inconsistent
Reliability suffers

By making file uploads a first-class feature:

APIs become more predictable
Validation becomes reliable
Developer experience improves significantly

Open Source Insight

One interesting part of this work:

The implementation evolved through multiple iterations before reaching the final version.

Large features often:

Start broad
Get refined for maintainability
Land as cleaner, more focused implementations

That process is what makes open source powerful —
it’s not just about shipping code, but improving it collaboratively.

Final Thoughts

If you’re building serverless APIs and dealing with file uploads:

You no longer need workarounds.

This feature brings Powertools closer to frameworks like:

FastAPI
Django
Express

…but in a serverless-native way.

References

Feature PR: https://github.com/aws-powertools/powertools-lambda-python/pull/8093
Original implementation: https://github.com/aws-powertools/powertools-lambda-python/pull/7132
Feature request: https://github.com/aws-powertools/powertools-lambda-python/issues/7124

How I Fixed an SSI-Breaking Bug in NGINX Gateway Fabric

Michael Uanikehi — Mon, 23 Mar 2026 21:47:24 +0000

Introduction

This bug found me, not the other way around.

I was in the middle of migrating my team's infrastructure from NGINX Ingress Controller to NGINX Gateway Fabric when SSI (Server-Side Includes) stopped working. Subrequests were hitting the wrong backend paths, and pages that relied on SSI includes were silently broken. I could have worked around it, added a flag, patched a config, moved on. Instead, I decided to find where it was actually coming from and fix it at the source.

This is the story of that bug, what caused it, and the one-line condition that fixed it.

Background: What is NGINX Gateway Fabric?

NGINX Gateway Fabric is a Kubernetes Gateway API implementation backed by NGINX. It translates Kubernetes HTTPRoute resources into NGINX configuration, handling routing, load balancing, and traffic management. It's the next-generation replacement for NGINX Ingress Controller, built around the Kubernetes Gateway API spec.

When processing HTTP routes, it generates proxy_pass directives in NGINX location blocks to forward traffic to upstream backends.

The Migration That Surfaced the Bug

During the migration from NGINX Ingress Controller to NGINX Gateway Fabric, one of our services used SSI to compose pages from multiple backend responses a pattern like:

<h1>SSI Test Page</h1>
<!--# include virtual="/include.html" -->

On NGINX Ingress Controller this worked fine. After switching to NGINX Gateway Fabric, the SSI includes were broken. After some investigation, the generated proxy_pass directive was the culprit:

proxy_pass http://my-backend$request_uri;

$request_uri in NGINX always holds the original client request URI, it never changes, even during internal subrequests. So when SSI triggered a subrequest to /include.html, the proxy_pass directive would still forward $request_uri (e.g. /) to the backend completely ignoring the subrequest's intended path.

Understanding the Two Location Types

NGINX Gateway Fabric uses two types of location blocks:

External locations — match the original client request directly. Here, $uri is already the correctly processed URI. Using $request_uri is redundant and harmful.

Internal locations — used for NJS-driven HTTP matching. When a request comes in, NJS evaluates the matching rules and calls r.internalRedirect(match.redirectPath + args), which changes $uri to an internal path like /@rule0-route0. Without $request_uri, NGINX would forward that internal path to the backend — so $request_uri is needed here to restore the original client URI.

The bug was that the code made no distinction between these two cases.

The Fix

In internal/controller/nginx/config/servers.go, the createProxyPass function was changed from:

// Before: applies to ALL non-gRPC locations
if !grpc {
    if filter == nil || filter.Path == nil {
        requestURI = "$request_uri"
    }
}

To:

// After: only applies to INTERNAL locations
if !grpc && locationType == http.InternalLocationType {
    if filter == nil || filter.Path == nil {
        requestURI = "$request_uri"
    }
}

This means:

External locations generate: proxy_pass http://my-backend;
Internal locations still generate: proxy_pass http://my-backend$request_uri;

Verifying the Fix

I deployed the fix to a local kind cluster with an SSI-enabled backend. Before the fix, the SSI include failed the subrequest hit the wrong path. After the fix, the response correctly returned:

<h1>SSI Test Page</h1>
<p>This content was included via SSI!</p>

The generated NGINX config confirmed the clean proxy_pass directive with no $request_uri for the external location.

Takeaways

Real-world migrations are great bug finders. Moving from NGINX Ingress Controller to NGINX Gateway Fabric exposed a behavioral difference that tests hadn't caught.
NGINX's $request_uri is immutable across the entire request lifecycle, including subrequests. it always reflects the original client URI.
Blindly appending $request_uri to proxy_pass can interfere with NGINX features that rely on internal subrequests (SSI, auth subrequests, etc.).
When you hit a bug in open source, you can just fix it. The maintainers were responsive and collaborative throughout the review process.

PR: https://github.com/nginx/nginx-gateway-fabric/pull/4935

Measuring What Matters: Rethinking Serverless Workflows with AWS Lambda Durable Functions

Michael Uanikehi — Sat, 21 Mar 2026 19:12:41 +0000

Most serverless workflows don’t fail because they can’t scale.

They fail because when something goes wrong, engineers can’t easily answer:
• Where did this workflow break?
• What state was it in?
• What happened before the failure?

This is where “measuring what matters” becomes important.

Not more metrics.
Not more dashboards.
But better ways to understand system behaviour.

Recently, I explored AWS Lambda Durable Functions, and it exposed something interesting:

The way we structure workflows directly affects how well we can observe and debug them.

The Problem: Orchestration vs Understanding

If you’ve built workflows using AWS Step Functions, you already know the benefits:
• Clear state transitions
• Visual workflows
• Strong integration with AWS services

But in practice, there’s a trade-off; Workflow logic lives outside your application code.

That means:
• You switch between code and state machine definitions
• Debugging often requires jumping across tools
• Context is split across logs, states, and services

This works well for orchestration.

But it doesn’t always optimise for debugging and reasoning under pressure.

What Durable Functions Change

AWS Lambda Durable Functions take a different approach.

Instead of defining workflows externally, you write them directly in code.

The biggest shift is state management. Durable Functions are regular Lambda functions enhanced with stateful execution capabilities!
Durable functions automatically checkpoint progress, suspend execution for up to one year during long-running tasks, and recover from failures.

Here’s a simplified example:

from aws_lambda_powertools import Logger
import time

logger = Logger()

def order_workflow(event):
    order_id = event["order_id"]

    logger.info(f"Processing order {order_id}")

    # Step 1: Validate order
    validate_order(order_id)

    # Step 2: Wait for payment confirmation
    wait_for_payment(order_id)

    # Step 3: Process shipment
    ship_order(order_id)

    return {"status": "completed"}

Now imagine this workflow:
• pauses after wait_for_payment()
• resumes hours later when payment is confirmed
• continues with full context preserved

Why This Matters for Observability

This isn’t just about developer experience, It changes how you instrument and observe workflows.

With traditional orchestration:
• Step Function execution graphs
• Distributed logs
• External state tracking

With Durable Functions You can:
• Log at each logical step
• Track state transitions in code
• Correlate execution paths more naturally

Example:

logger.info({
    "step": "payment_wait",
    "order_id": order_id,
    "status": "pending"
})

Now your logs reflect business flow, not just system events.

Measuring What Actually Matters

In real systems, useful signals are not:
• “Lambda ran successfully”
• “Step transitioned”

Useful signals are:
• “Order is waiting on payment”
• “Workflow resumed after 2 hours”
• “Shipment failed after approval”

Durable Functions make it easier to express these signals because: your workflow structure matches your mental model

That alignment reduces the gap between:
• what the system is doing
• and what you think it’s doing

Durable Functions vs Step Functions (Practical View)

Use Step Functions when you need:
• Service orchestration across AWS (Lambda, ECS, Glue)
• Visual workflows for operations teams
• Built-in execution tracing

Use Durable Functions when you need:
• Workflow logic tightly coupled with application code
• Faster iteration and local testing
• Simpler debugging of business logic

Trade-offs (Important)

Durable Functions are not a silver bullet.

You lose:
• visual workflow diagrams
• some operational visibility for non-engineers

And you gain:
• code-level control
• simpler reasoning
• tighter integration with your application

Final Thoughts

Reliable systems are not just systems that run; They’re systems engineers can:
• understand
• debug
• trust during incidents

Durable Functions don’t magically solve observability But they remove a layer of abstraction that often gets in the way.

And that makes it easier to measure what actually matters.

If you’re already using Step Functions, you don’t need to replace them. But if your workflows feel harder to reason about than they should…

It might be worth trying a different approach.

Troubleshooting EFS Mount Failures in EKS: The IAM Mount Option Mystery

Michael Uanikehi — Wed, 14 Jan 2026 00:57:48 +0000

TL;DR

If you're getting mount.nfs4: access denied by server while mounting 127.0.0.1:/ when mounting EFS volumes in EKS, and your security groups are correct, you're probably missing the iam mount option in your PersistentVolume definition when using an EFS file system policy.

The Problem

While integrating a new reporting service into our EKS cluster that needed to write reports to a shared EFS filesystem. The pod kept failing to mount with this cryptic error:

MountVolume.SetUp failed for volume "efs-pv": rpc error: code = Internal desc = Could not mount "{efs_id}:/"
Output: mount.nfs4: access denied by server while mounting 127.0.0.1:/

The Investigation Journey

Initial Suspicions (All Wrong)

Theory 1: Security Group Issues

Verified NFS traffic (TCP 2049) allowed between worker nodes and EFS mount targets
Mount targets existed in all Availability Zones
Result: Security groups were perfect. Not the issue.

Theory 2: EFS File System Policy

We had recently added an IAM-based file system policy to restrict access
Policy included conditions like aws:PrincipalArn to whitelist specific IAM roles
The breakthrough: Removing the policy made it work!

The Eureka Moment

Reading the AWS EFS troubleshooting documentation, I found this gem:

If you don't add the iam mount option with a restrictive file system policy, then the pods fail with the following error message:

mount.nfs4: access denied by server while mounting 127.0.0.1:/

Root Cause Analysis

The issue had three interconnected parts:

1. EFS File System Policy Conditions

We used aws:PrincipalArn in our policy conditions:

{
 "Condition": {
 "ArnLike": {
 "aws:PrincipalArn": [
 "arn:aws:iam::123456789012:role/worker-node-role",
 "arn:aws:iam::123456789012:role/efs-csi-driver-role"
 ]
 }
 }
}

Problem: Per AWS docs, aws:PrincipalArn and most IAM condition keys are NOT enforced for NFS client mounts to EFS. Only these conditions work:

aws:SecureTransport (Boolean)
aws:SourceIp (String - public IPs only)
elasticfilesystem:AccessPointArn (String)
elasticfilesystem:AccessedViaMountTarget (Boolean)

2. Missing IAM Mount Option

Our PersistentVolume was missing the iam mount option:

# BEFORE - Missing iam mount option
apiVersion: v1
kind: PersistentVolume
metadata:
 name: efs-pv
spec:
 storageClassName: aws-efs-csi-sc
 csi:
 driver: efs.csi.aws.com
 volumeHandle: "{efs_id}"

Without iam, the EFS CSI driver doesn't authenticate using IAM roles, so any file system policy with IAM restrictions fails.

3. The EFS Mount Flow

When using the EFS CSI driver with tls mount option:

Node-level mount happens first (via worker node IAM role)
Without iam option → Anonymous NFS mount
With iam option → Authenticated mount using IAM role credentials

The Solution

Fix 1: Added `mountOptions: [tls, iam]` to PersistentVolume

#  AFTER - With iam mount option
apiVersion: v1
kind: PersistentVolume
metadata:
 name: efs-pv
spec:
 storageClassName: aws-efs-csi-sc
 mountOptions:
 - tls # Encryption in transit
 - iam # Enable IAM authentication
 csi:
 driver: efs.csi.aws.com
 volumeHandle: "{efs_id}"

Fix 2: Use Only Supported EFS Condition Keys

If you need a file system policy, use only the supported conditions:

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Principal": { "AWS": "*" },
 "Action": [
 "elasticfilesystem:ClientMount",
 "elasticfilesystem:ClientWrite",
 "elasticfilesystem:ClientRootAccess"
 ],
 "Resource": "arn:aws:elasticfilesystem:us-east-1:123456789012:file-system/{efs_id}",
 "Condition": {
 "Bool": {
 "elasticfilesystem:AccessedViaMountTarget": "true",
 "aws:SecureTransport": "true"
 }
 }
 }
 ]
}

This policy:

Requires TLS encryption (aws:SecureTransport)
Requires access via mount targets (prevents direct IP access)
Uses only supported condition keys
Relies on security groups for network-level access control

Key Learnings

1. IAM Mount Option is Required for IAM Authorization

Without -o iam, EFS mounts are anonymous. Any IAM-based file system policy will deny access.

2. Not All IAM Conditions Work with EFS

Only 4 condition keys are enforced for NFS mounts. Using others creates a false sense of security.

3. Layer Your Security Properly

Network Layer: Security groups (who can reach mount targets)
IAM Layer: IAM policies on roles (what actions are allowed)
File System Layer: EFS policy (additional restrictions)

4. Read the Error Logs Carefully

The error message mentioned 127.0.0.1 because the EFS mount helper creates a local stunnel proxy for TLS. The actual connection fails at the IAM authorization layer, not network layer.

5. Test Mount Operations Manually

SSH to a worker node and test the mount with the EFS mount helper:

sudo mount -t efs -o tls,iam {efs_id}:/ /mnt/test

This validates the configuration outside of Kubernetes.

Conclusion

What seemed like a complex IAM policy issue turned out to be a missing mount option. The key insight was understanding that EFS file system policies require explicit IAM authentication via the iam mount option, and that most IAM condition keys don't apply to NFS mounts.

Measuring What Matters: Adding Multiple Dimension Sets to AWS Lambda Powertools

Michael Uanikehi — Mon, 12 Jan 2026 23:36:20 +0000

Most production systems don’t fail because they lack metrics.
They fail because the metrics they do have flatten reality.

Over time, I kept seeing the same pattern across teams and architectures: engineers had plenty of dashboards, yet struggled to answer simple questions during incidents.

Not because the data wasn’t there but because it was aggregated in ways that hid meaningful differences.

This is the problem that led to the addition of multiple dimension sets in AWS Lambda Powertools for Python.

The Real Problem: Aggregation, Not Instrumentation

CloudWatch’s Embedded Metric Format (EMF) has long supported dimensional metrics.
In theory, this allows teams to slice metrics by environment, region, customer type, or deployment shape.

In practice, most teams are forced to choose one aggregation view per metric emission.

You can measure latency by:
• service + region, or
• service + environment, or
• service + customer_type

But not all of them at once unless you emit the same metric repeatedly with different dimension combinations.

That trade-off shows up quickly in real systems:
• Metrics get duplicated
• Code becomes verbose and fragile
• CloudWatch costs increase
• Important aggregation paths are missing when you need them most

The result isn’t just inefficiency it’s lost confidence during incidents.

The Feature Request That Captured the Pattern

This limitation wasn’t theoretical.

In early 2025, a community member opened a feature request in the AWS Lambda Powertools repository:

“Add support for multiple dimension sets to the same Metrics instance”
(Issue #6198)

The use case was clear:
• A Lambda deployed across multiple regions and environments
• Metrics that needed to be aggregated by environment, region, and both
• One metric value, many meaningful views

The request also highlighted an important fact:

The EMF specification already supports this.

The Dimensions field in EMF is defined as an array of arrays each inner array representing a different aggregation view.

Other Powertools runtimes (TypeScript, Java, .NET) already exposed this capability.

Python didn’t.

From Feature Request to Production-Ready Implementation

After maintainers aligned on the approach, I picked up the work to implement this feature for the Python runtime.

The goal wasn’t to invent something new.
It was to:
• Align Python with the EMF specification
• Reach feature parity with other Powertools runtimes
• Deliver a clean, intuitive API that felt natural to existing users

Design principles

Before touching code, a few constraints guided the implementation:
• Backward compatibility - existing add_dimension() behavior must remain unchanged
• Clear mental model - no hidden side effects or ambiguous APIs
• Spec-aligned output - serialized EMF must match CloudWatch expectation
• Production safety - strict validation and cleanup between invocations

The Resulting API

The final design mirrors the proven pattern from the TypeScript implementation:
• add_dimension() → adds to the primary dimension set
• add_dimensions() → creates a new aggregation view

Example usage

from aws_lambda_powertools import Metrics
from aws_lambda_powertools.metrics import MetricUnit

metrics = Metrics(namespace="ServerlessAirline", service="booking")

metrics.add_dimensions({"environment": "prod", "region": "us-east-1"})
metrics.add_dimensions({"environment": "prod"})
metrics.add_dimensions({"region": "us-east-1"})

metrics.add_metric(
    name="SuccessfulRequests",
    unit=MetricUnit.Count,
    value=100
)

With a single metric emission, CloudWatch can now aggregate across:
• environment + region
• environment only
• region only

No duplicate metrics.
No parallel pipelines.
No guesswork.

What Changed Under the Hood

The implementation introduced a few key changes:
• Tracked multiple dimension sets internally
• Updated EMF serialization to emit all dimension arrays
• Ensured default dimensions are automatically included
• Enforced CloudWatch’s 30-dimension limit
• Handled duplicate keys deterministically (“last value wins”)
• Cleared dimension state safely between invocations

To ensure reliability, the change shipped with 13 new tests, covering:
• Multiple dimension set creation
• Validation and edge cases
• Integration with existing metrics features
• High-resolution metrics compatibility

All existing tests passed, code quality checks succeeded, and maintainers approved the change for merge.

Why This Matters in Production

This feature doesn’t add more metrics.

It makes existing metrics more truthful.

When teams can express multiple aggregation views at the point of emission:
• Incident response becomes faster
• Dashboards become simpler
• Alerting becomes more precise
• Engineers trust what they see

Metrics are contracts.
If they can’t reflect how users actually experience the system, they quietly fail.

Multiple dimension sets don’t eliminate operational problems but they remove a blind spot that many teams didn’t realize they had.

The full implementation, tests, and maintainer review can be found in the merged pull request:
https://github.com/aws-powertools/powertools-lambda-python/pull/7848

Open Source as Shared Problem-Solving

What made this contribution meaningful wasn’t just the code.

It was the process:
• A well-documented community feature request
• Maintainer collaboration across runtimes
• Alignment with existing specifications
• A solution designed for long-term maintainability

This is open source at its best: turning recurring operational pain into shared infrastructure improvements.

Measuring What Actually Matters

Reliability isn’t about collecting more data.
It’s about choosing the signals that deserve to exist.

This change helps teams measure systems the way users experience them — not just the way dashboards prefer.

And that difference matters.

If you’re duplicating EMF emissions just to get different aggregation views, this should make your metrics simpler, clearer, and more reliable.

And if you run into edge cases, open an issue.

That’s how this ecosystem keeps improving.

Securing Cross-Account AWS Operations: Adding External ID Support to CDK AwsCustomResource

Michael Uanikehi — Thu, 20 Nov 2025 01:05:15 +0000

I recently contributed to the AWS Cloud Development Kit (CDK) by implementing External ID support for AwsCustomResource, a feature that enhances security for cross-account AWS operations. The pull request #35252 was merged into the main branch after a comprehensive review process, addressing a critical security gap identified in issue #34018.

This article walks through the problem, the solution, and the engineering decisions that went into this contribution.

The Problem: Confused Deputy Attacks
What is a Confused Deputy Attack?
In multi-account AWS environments, services often need to assume roles across accounts. A "confused deputy" attack occurs when a malicious actor tricks a service (the "deputy") into performing unauthorized actions by exploiting the trust relationship between accounts.

Consider this scenario:

Your Lambda function assumes a role in Account B using sts:AssumeRole
An attacker discovers your function's configuration
The attacker creates their own resource that tricks your function into assuming a role in their account instead
Your function unknowingly performs operations in the attacker's account

The Security Gap in AwsCustomResource
Before this contribution, AwsCustomResource supported cross-account operations via assumedRoleArn but lacked support for External IDs—a critical AWS security best practice. This forced developers to choose between:

Security: Skip cross-account functionality
Functionality: Accept increased security risk

The Solution: External ID Support
What is an External ID?
An External ID is a secret value that must be provided when assuming a role. It acts as a second factor of authentication, ensuring that only entities with both the correct ARN and the secret can assume the role.

// Trust policy in the assumed role (Account B)
{
  "Effect": "Allow",
  "Principal": { "AWS": "arn:aws:iam::ACCOUNT-A:role/CustomResourceRole" },
  "Action": "sts:AssumeRole",
  "Condition": {
    "StringEquals": {
      "sts:ExternalId": "my-secret-external-id-12345"
    }
  }
}

Implementation Overview
The implementation spans three key areas:

CDK Construct Interface (aws-custom-resource.ts)

export interface AwsSdkCall {
  // ... existing properties

  /**
   * The external ID to use when assuming the role for this call.
   * 
   * An external ID is a secret identifier that you define and share with the
   * account owner. It helps prevent the "confused deputy" problem in cross-account
   * scenarios.
   *
   * @see https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html
   * @default - No external ID is used
   */
  readonly externalId?: string;

  /**
   * The ARN of the role to assume for this call.
   * 
   * When specified with externalId, both must be provided to the AssumeRole call.
   *
   * @default - No role is assumed (calls are made with the Lambda function's role)
   */
  readonly assumedRoleArn?: string;
}

Key design decisions:

Optional property: Maintains backward compatibility
Comprehensive documentation: Explains security benefits and links to AWS best practices
Paired with assumedRoleArn: Only applies when cross-account operations are configured

Lambda Handler (custom-resource-handlers/lib/custom-resources/utils.ts)

async function getCredentials(assumedRoleArn?: string, externalId?: string): Promise<AWS.Credentials | undefined> {
  if (!assumedRoleArn) {
    return undefined;
  }

  const sts = new AWS.STS();
  const timestamp = new Date().getTime();

  const params: AWS.STS.AssumeRoleRequest = {
    RoleArn: assumedRoleArn,
    RoleSessionName: `AwsSdkCall-${timestamp}`,
  };

  // Add External ID if provided
  if (externalId) {
    params.ExternalId = externalId;
  }

  const { Credentials: assumedCredentials } = await sts.assumeRole(params).promise();

  if (!assumedCredentials) {
    throw new Error('Failed to assume role');
  }

  return new AWS.Credentials({
    accessKeyId: assumedCredentials.AccessKeyId,
    secretAccessKey: assumedCredentials.SecretAccessKey,
    sessionToken: assumedCredentials.SessionToken,
  });
}

The implementation:

Passes External ID to STS AssumeRole calls when provided
Maintains backward compatibility (works without External ID)
Uses existing AWS SDK patterns for consistency

Type Safety (construct-types.ts)

export interface AwsSdkCall {
  service: string;
  action: string;
  parameters?: any;
  physicalResourceId?: PhysicalResourceId;
  assumedRoleArn?: string;
  externalId?: string;  // Added for type safety
  region?: string;
  apiVersion?: string;
  outputPaths?: string[];
}

Ensures type consistency between the CDK construct and Lambda handler.

Usage Examples
Basic Cross-Account Operation with External ID

import { AwsCustomResource, AwsCustomResourcePolicy, PhysicalResourceId } from 'aws-cdk-lib/custom-resources';
import { PolicyStatement, Effect } from 'aws-cdk-lib/aws-iam';

const customResource = new AwsCustomResource(this, 'CrossAccountOperation', {
  onCreate: {
    service: 'S3',
    action: 'putObject',
    parameters: {
      Bucket: 'cross-account-bucket',
      Key: 'data.json',
      Body: JSON.stringify({ message: 'Hello from Account A' }),
    },
    physicalResourceId: PhysicalResourceId.of('cross-account-s3-object'),
    assumedRoleArn: 'arn:aws:iam::ACCOUNT-B:role/CrossAccountS3Role',
    externalId: 'my-secret-external-id-12345',  // Security enhancement!
  },
  policy: AwsCustomResourcePolicy.fromStatements([
    new PolicyStatement({
      effect: Effect.ALLOW,
      actions: ['sts:AssumeRole'],
      resources: ['arn:aws:iam::ACCOUNT-B:role/CrossAccountS3Role'],
    }),
  ]),
});

Different External IDs per Operation

const multiOpResource = new AwsCustomResource(this, 'MultiOpResource', {
  onCreate: {
    service: 'DynamoDB',
    action: 'putItem',
    parameters: { /* ... */ },
    assumedRoleArn: 'arn:aws:iam::ACCOUNT-B:role/DynamoDBRole',
    externalId: 'create-external-id-abc123',
  },
  onUpdate: {
    service: 'DynamoDB',
    action: 'updateItem',
    parameters: { /* ... */ },
    assumedRoleArn: 'arn:aws:iam::ACCOUNT-B:role/DynamoDBRole',
    externalId: 'update-external-id-xyz789',  // Different External ID
  },
  onDelete: {
    service: 'DynamoDB',
    action: 'deleteItem',
    parameters: { /* ... */ },
    assumedRoleArn: 'arn:aws:iam::ACCOUNT-B:role/DynamoDBRole',
    externalId: 'delete-external-id-def456',  // Different External ID
  },
  policy: AwsCustomResourcePolicy.fromStatements([
    new PolicyStatement({
      effect: Effect.ALLOW,
      actions: ['sts:AssumeRole'],
      resources: ['arn:aws:iam::ACCOUNT-B:role/DynamoDBRole'],
    }),
  ]),
});

Validation and Testing
Comprehensive Test Coverage
The contribution includes three levels of testing:

Unit Tests (10 test cases) External ID parameter propagation to CloudFormation Different External IDs for different operations Backward compatibility without External ID Integration with existing assumedRoleArn CloudFormation template validation Edge cases and error scenarios

test('can specify external ID for cross-account operations', () => {
  const stack = new Stack();

  new AwsCustomResource(stack, 'MyResource', {
    onCreate: {
      service: 'S3',
      action: 'listBuckets',
      assumedRoleArn: 'arn:aws:iam::123456789012:role/MyRole',
      externalId: 'my-external-id',
      physicalResourceId: PhysicalResourceId.of('list-buckets'),
    },
    policy: AwsCustomResourcePolicy.fromSdkCalls({ resources: ['*'] }),
  });

  Template.fromStack(stack).hasResourceProperties('Custom::AWS', {
    Create: {
      assumedRoleArn: 'arn:aws:iam::123456789012:role/MyRole',
      externalId: 'my-external-id',
    },
  });
});

Integration Tests (4 scenarios)
Real cross-account role assumption
STS GetCallerIdentity validation
CDK snapshot validation
End-to-end workflow testing

const role = new iam.Role(stack, 'Role', {
  assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'),
  externalIds: ['external-id-12345'],
});

const resource = new cr.AwsCustomResource(stack, 'GetCallerIdentity', {
  onCreate: {
    service: 'STS',
    action: 'getCallerIdentity',
    assumedRoleArn: role.roleArn,
    externalId: 'external-id-12345',
    physicalResourceId: cr.PhysicalResourceId.of('caller-identity'),
  },
  policy: cr.AwsCustomResourcePolicy.fromSdkCalls({ resources: ['*'] }),
});

Lambda Handler Tests (7 test cases)
getCredentials function correctly passes External ID
STS AssumeRole includes ExternalId parameter
Backward compatibility verification
Type safety validation

Design Decisions and Alternatives
Why Optional External ID?
Making externalId optional maintains backward compatibility. Existing users aren't forced to change their code, while new users can adopt the security best practice.

Why Per-Operation External IDs?
Different operations may require different security contexts. For example:

Create: Stricter security with unique External ID
Update: Different permissions, different External ID
Delete: Potentially destructive, requires highest security
This granularity provides maximum flexibility.

Key Takeaways for Contributors

Security-First Design
Always consider security implications, especially for cross-account operations. Reference AWS documentation and best practices.
Backward Compatibility is Critical
CDK is used by thousands of organizations. Any breaking change can affect production systems. Design features to be additive.
Comprehensive Testing
Unit tests, integration tests, and manual validation ensure robustness. Don't skimp on test coverage.
Documentation Matters
Inline documentation, README updates, and practical examples help users understand and adopt new features safely.
Follow Existing Patterns
Consistency with existing CDK patterns makes features more intuitive and maintainable.

Conclusion
Adding External ID support to AwsCustomResource closes a critical security gap in AWS CDK's multi-account capabilities. The feature:

Prevents confused deputy attacks
Maintains full backward compatibility
Follows AWS security best practices
Provides flexible per-operation configuration
Includes comprehensive testing and documentation

This contribution demonstrates that security enhancements don't have to come at the cost of usability or backward compatibility. By carefully considering design decisions and thoroughly testing the implementation, we can deliver features that make AWS CDK both more powerful and more secure.

Open Source as a Force Multiplier: How Small Fixes Scale Global Impact

Michael Uanikehi — Sun, 19 Oct 2025 23:50:41 +0000

Sometimes the things that change your career don’t start big.
They start with a tiny pull request.

For me, it was on AWS Lambda Powertools for Python, a library I’d used countless times in production.
I wasn’t trying to do anything revolutionary. I just noticed that working with form data in serverless APIs was a little… clunky. So, I opened an issue, wrote a few lines of code, added some tests, and pushed a PR.

That little change ended up helping thousands of developers build cleaner, better-documented AWS Lambda APIs.
And that’s when it hit me Open Source is a force multiplier.

How a “tiny fix” became a big deal

When you contribute to open source, you’re not just solving your own problem. You’re solving everyone’s who runs into that same wall after you.

My contribution added OpenAPI form-data support basically, it let Lambda developers automatically document and validate form submissions without extra code. Nothing flashy. But it saved people time. It cleaned up their docs. It made their lives a little easier.

That’s the magic of open source. You make a small move, and somehow, it ripples outward into something much bigger than you.

What I learned in the process

The code itself wasn’t the hardest part- It was everything around it the reviews, the feedback loops, the back-and-forth with maintainers who genuinely wanted to help me improve it.

Here’s what stood out to me:
1. Small scope doesn’t mean small impact. Sometimes the best contributions aren’t new features they’re refinements that make everyone’s day smoother.
2. Good maintainers are secret teachers. They don’t just merge code they help you understand why something should be done a certain way.
3. Documentation and tests matter just as much as code. They make your contribution live longer than you do in the repo.
4. Community is where confidence grows. Each PR teaches you a little more about how other engineers think, and that’s gold.

Why open source still matters

In a world where AI can generate a function in seconds, what still makes open source special isn’t the speed; it’s the shared understanding.

When you contribute, you’re not just adding lines of code; you’re adding patterns, lessons, and empathy into the ecosystem.
That’s what scales! That’s what lasts!

So, where do you start?

If you’ve been wanting to contribute but it feels intimidating, here’s my honest advice:
Start small.

You don’t need to build a new framework or rewrite an entire service. Sometimes, it’s about fixing something you bump into every day at work.
Fix a typo, Improve a docstring, Write a test for something untested.
The maintainers will thank you, and you’ll get the hang of the flow the issues, the reviews, the merge.

Don’t wait for a “big idea.” Open source grows one small improvement at a time.

That’s literally how another one of my contributions happened.
I was working on an AWS project and noticed that the amazon-efs-utils tool wasn’t correctly handling the --region flag during cross-region mounts.
It was a tiny thing, but it caused big headaches for anyone mounting EFS volumes across AWS regions.

So I fixed it, opened a PR, and it got merged upstream.
Now, that small fix helps every engineer who uses EFS in multi-region setups.

That’s the beauty of open source you’re not just patching your own problem; you’re preventing hundreds of others from hitting the same wall.

Fix the thing that slows you down today, and chances are, you’ll speed up someone else tomorrow.

Final thought

I’ve worked in large systems, seen how much effort goes into building reliable cloud platforms but nothing compares to the shared power of open source.
It’s the one place where a single developer can write 10 lines of code and quietly make life easier for 10,000 others.

And that, to me, is the real definition of scale.

If you’ve ever hesitated to make your first contribution, this is your sign to go for it!!!

Turn log lines into alerts (without building a whole observability stack)

Michael Uanikehi — Sat, 13 Sep 2025 11:57:51 +0000

Cold truth: problems always show up in logs first. The trick is turning those “uh-oh” lines into a nudge in your inbox before users feel it.

Here’s the dead-simple pattern I use in AWS:

CloudWatch Logs → Metric Filter → Alarm → SNS (Email/Slack)

No new services to run. No extra agents. Just wiring.

Why this works

Think of CloudWatch Logs as a river. Metric filters are little nets you drop in: “catch anything that looks like ERROR” or “grab JSON where level=ERROR and service=payments.” Each catch bumps a metric. Alarms watch that metric and boom; email, Slack, PagerDuty, whatever you like.

Cheap. Fast. No app changes.

App → CloudWatch Logs ──(metric filter)──▶ Metric
│
└──▶ Alarm ──▶ SNS ──▶ Email/Slack

Step 1: create an SNS topic (so you get alerted)

aws sns create-topic --name app-alarms
# copy the "TopicArn" from the output
TOPIC_ARN="arn:aws:sns:REGION:ACCOUNT_ID:app-alarms"

# subscribe your email (confirm the email to activate)
aws sns subscribe \
  --topic-arn "$TOPIC_ARN" \
  --protocol email \
  --notification-endpoint you@example.com

Step 2: add a metric filter to your log group

Option A — simple keyword (“ERROR” but not health checks):

LOG_GROUP="/aws/lambda/my-fn"

aws logs put-metric-filter \
  --log-group-name "$LOG_GROUP" \
  --filter-name "ErrorCount" \
  --filter-pattern '"ERROR" -HealthCheck' \
  --metric-transformations \
      metricName=ErrorCount,metricNamespace="App/Alerts",metricValue=1,defaultValue=0

Option B — structured JSON logs (recommended):

aws logs put-metric-filter \
  --log-group-name "$LOG_GROUP" \
  --filter-name "PaymentsErrors" \
  --filter-pattern '{ $.level = "ERROR" && $.service = "payments" }' \
  --metric-transformations \
      metricName=PaymentsErrorCount,metricNamespace="App/Alerts",metricValue=1

Step 3: create an alarm on that metric

Alert if we see ≥ 1 error per minute for 3 minutes:

aws cloudwatch put-metric-alarm \
  --alarm-name "LambdaErrorBurst" \
  --metric-name ErrorCount \
  --namespace "App/Alerts" \
  --statistic Sum \
  --period 60 \
  --evaluation-periods 3 \
  --threshold 1 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --treat-missing-data notBreaching \
  --alarm-actions "$TOPIC_ARN" \
  --ok-actions "$TOPIC_ARN"

That treat-missing-data=notBreaching bit keeps you from getting “we’re fine!” alerts when traffic is quiet

Step 4: test it (don’t skip this)
1. Log an ERROR that matches your filter.
2. In CloudWatch Metrics → App/Alerts, make sure the metric ticks up.
3. Watch the alarm flip to ALARM and check your email.

If nothing happens, go to your Log Group → Metric filters → Test pattern and paste a real log line. It’ll tell you if your pattern matches.

Prefer Terraform? here’s the whole thing

resource "aws_sns_topic" "app_alarms" {
  name = "app-alarms"
}

resource "aws_sns_topic_subscription" "email" {
  topic_arn = aws_sns_topic.app_alarms.arn
  protocol  = "email"
  endpoint  = "you@example.com"
}

resource "aws_cloudwatch_log_metric_filter" "errors" {
  name           = "ErrorCount"
  log_group_name = "/aws/lambda/my-fn"
  pattern        = "\"ERROR\" -HealthCheck"

  metric_transformation {
    name          = "ErrorCount"
    namespace     = "App/Alerts"
    value         = "1"
    default_value = "0"
  }
}

resource "aws_cloudwatch_metric_alarm" "error_alarm" {
  alarm_name          = "LambdaErrorBurst"
  namespace           = "App/Alerts"
  metric_name         = "ErrorCount"
  statistic           = "Sum"
  period              = 60
  evaluation_periods  = 3
  threshold           = 1
  comparison_operator = "GreaterThanOrEqualToThreshold"
  treat_missing_data  = "notBreaching"
  alarm_actions       = [aws_sns_topic.app_alarms.arn]
  ok_actions          = [aws_sns_topic.app_alarms.arn]
}

Common gotchas (learned the hard way)
• Case matters. ERROR ≠ error. Match what you actually log.
• Per line matching. Filters look at one line at a time. If your stack trace spans lines, rely on a JSON level field instead.
• Right account/region. Metric filters must live with the log group.
• Don’t explode cardinality. Keep one metric per signal; don’t bake IDs into metric names.
• No alerts during quiet times. That treat missing data setting is your chill pill.

Variations you’ll probably want
• Slack/Teams: SNS → lambda → Slack (point & click).
• PagerDuty/Opsgenie: SNS → EventBridge → your incident tool.
• Smarter thresholds: Try Anomaly Detection alarms once you have baseline traffic.
• Composite alarms: “Only alert if errors spike and p50 latency is ugly.”

You don’t need a massive observability rebuild to get useful alerts. Start with one or two high signal patterns timeouts, 5xx, “payment failed” wire them to email, and iterate.

Tiny effort. Big safety net.

Killing cold starts with Lambda SnapStart

Michael Uanikehi — Thu, 14 Aug 2025 20:39:04 +0000

Serverless is amazing for scale, but cold starts are the silent tax we pay. In this post, we’ll unpack AWS Lambda SnapStart, how it works, what limitations matter, and how to enable it across Terraform, SAM, and CDK to slash your cold starts.

What SnapStart actually does:

When you publish a version of your Lambda, SnapStart:
1. Initializes your function once (imports, SDK clients, DB pools, frameworks, etc.).
2. Takes a snapshot of memory + runtime state after init.
3. Caches that snapshot.
4. On new execution environments, restores from that snapshot instead of doing INIT again.

Result: far less startup time, often ~10× faster for heavy-initialization workloads. You’ll see a new “Restore Duration” in logs and X-Ray that reflects snapshot restore time.

Supported runtimes & key limitations

Runtimes (managed):
• Java 11+, Python 3.12+, .NET 8+. Not supported for Node.js, Ruby, OS-only runtimes, or container images.

Limits & incompatibilities:
• Not compatible with Provisioned Concurrency (choose one).
• No EFS, and /tmp must be ≤ 512 MB.
• Works on published versions (and aliases pointing to them) — not $LATEST.
• Pricing: charged for cache time (while version is active) and per restore; cost scales with memory size.

Lifecycle nuance:
• Snapshots for Python/.NET stay active as long as the version is active.
• For Java, a snapshot may expire after 14 days of inactivity.

When SnapStart shines
• Heavy init: big frameworks (Spring, .NET DI), large dependency graphs, JDBC/SDK client setup.
• Spiky or unpredictable traffic: where paying for Provisioned Concurrency would be wasteful.
• Latency-sensitive paths that still tolerate low-double-digit ms on restore.

When SnapStart is not the right tool:
• You must use EFS, >512MB ephemeral storage, or Provisioned Concurrency.
• Your function depends on unique state during INIT (see “uniqueness” below).

Enabling SnapStart with IaC

Terraform

resource "aws_lambda_function" "fn" {
  function_name = "snapstart-demo"
  runtime       = "python3.12" # or java11/java17/.NET 8+
  handler       = "app.handler"
  role          = aws_iam_role.lambda.arn
  filename      = "build.zip"

  # SnapStart only works on published versions — publish must be true
  publish = true

  snap_start {
    apply_on = "PublishedVersions"
  }
}

resource "aws_lambda_alias" "live" {
  name             = "live"
  function_name    = aws_lambda_function.fn.function_name
  function_version = aws_lambda_function.fn.version
}

Terraform requires publish = true and snap_start.apply_on = "PublishedVersions". Use an alias for safe releases.

AWS SAM (template.yaml)

Resources:
  SnapStartFn:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: python3.12
      Handler: app.handler
      CodeUri: .
      AutoPublishAlias: live
      SnapStart:
        ApplyOn: PublishedVersions

SnapStart appears under Edit > Basic settings in console, but with SAM you set it declaratively and always publish a version + alias.

AWS CDK (TypeScript)

const fn = new lambda.Function(this, 'Fn', {
  runtime: lambda.Runtime.PYTHON_3_12,
  code: lambda.Code.fromAsset('dist'),
  handler: 'app.handler',
  snapStart: { applyOn: lambda.SnapStartApplyOn.PUBLISHED_VERSIONS },
  currentVersionOptions: { removalPolicy: cdk.RemovalPolicy.DESTROY }
});

new lambda.Alias(this, 'Live', { aliasName: 'live', version: fn.currentVersion });

Measuring the impact
• CloudWatch Logs: Each cold start shows Restore Duration and Billed Restore Duration in the REPORT line. Track these to quantify improvements.
• AWS X-Ray: Enable tracing to see a Restore subsegment alongside invocation. Great for comparing before/after.

Best practices & “uniqueness” gotchas

Because multiple execution environments start from the same snapshot, don’t create per-invoke unique state during init. Move anything that must be unique into the handler (or use after-restore hooks if your runtime supports them). Watch for:
• Random/UUID seeded at init → generate inside the handler.
• Ephemeral tokens/leases → fetch per request.
• Time-based logic in init → recompute on first invoke (or use after-restore).
See AWS docs on uniqueness and runtime hooks if you rely on special init behavior.

What’s safe in INIT?
• SDK clients / DB pools (connection creation may still occur lazily).
• Static configuration (env vars, constants).
• Framework boot (Spring/.NET DI, serializers).

Cost awareness

You pay for:
• Snapshot cache time while the version is active (per-ms, minimum 3 hours).
• Snapshot restore GB per resume.
Use smaller memory sizes where possible and clean up old versions to avoid lingering cache cost.

Real-world tuning checklist
• Publish versions + use aliases (live, beta) so each release creates a fresh snapshot.
• Warm critical paths: A canary/health check can invoke the alias after deploy to “prime” the version state.
• Log & trace: Compare cold start vs. restore using Restore Duration and X-Ray.
• Right-size memory: Faster CPU → faster restore, but balance against cache/restore costs.
• CI/CD: Account for snapshot creation time when publishing (deploy steps may take longer).

Quick examples by runtime

Python 3.12

app.py

import boto3
# OK to create clients in init – captured in snapshot
s3 = boto3.client("s3")

def handler(event, context):
    # Generate per-invoke UUIDs/timestamps inside the handler, not at import time
    # ...your logic...
    return {"ok": True}

Java (11+/17+/21)
• Favor GraalVM native image (where possible) or tune your framework bootstrap (Spring AOT, Micronaut, Quarkus).
• Avoid init-time randomness; move uniqueness to the handler.
• Enable X-Ray to visualize Restore vs Invocation.

.NET 8
• Heavy DI? Great SnapStart candidate.
• Ensure libraries don’t assume “fresh process” uniqueness during initialization.

SnapStart vs Provisioned Concurrency (PC)
• SnapStart: Pay per snapshot cache/restore; great for many bursty workloads; no EFS/PC.
• PC: Always-warm environments; extra charges constantly; works with EFS; deterministic lowest-latency.
You can’t enable both on the same version—pick the one that fits your constraints.

Final thoughts

If you’re running Java, Python 3.12+, or .NET 8+, SnapStart is a no-brainer to reduce latency without the always-on bill of Provisioned Concurrency. Enable it once, measure your gains, and enjoy faster cold starts at scale.

How to Handle Form Data in AWS Lambda APIs with Powertools OpenAPI Support

Michael Uanikehi — Wed, 06 Aug 2025 18:04:07 +0000

A complete guide to using the new Form parameter support in AWS Lambda Powertools for Python

AWS Lambda Powertools for Python now supports form data parameters in OpenAPI schema generation! This means you can build Lambda APIs that accept application/x-www-form-urlencoded data with automatic validation and documentation.

Why This Matters
Before this feature, Lambda APIs built with Powertools could only generate proper OpenAPI schemas for JSON payloads. If you needed to handle form data (like HTML forms or certain client applications), you had to:

Manually parse form data from the raw request body
Write custom validation logic
Maintain separate API documentation
Handle errors without proper validation feedback

Now you can handle form data declaratively with automatic validation, error handling, and OpenAPI documentation generation.

Getting Started

Installation
Make sure you have the latest version of AWS Lambda Powertools:
pip install aws-lambda-powertools[validation]

Basic Form Handling
Here's how to create a Lambda function that accepts form data:

from typing import Annotated
from aws_lambda_powertools import Logger
from aws_lambda_powertools.event_handler import APIGatewayRestResolver
from aws_lambda_powertools.event_handler.openapi.params import Form

logger = Logger()
app = APIGatewayRestResolver(enable_validation=True)

@app.post("/contact")
def submit_contact_form(
    name: Annotated[str, Form(description="Contact's full name")],
    email: Annotated[str, Form(description="Contact's email address")],
    message: Annotated[str, Form(description="Contact message")]
):
    """Handle contact form submission."""
    logger.info("Processing contact form", extra={
        "name": name,
        "email": email
    })

    # Process the form data
    # (save to database, send email, etc.)

    return {
        "message": "Contact form submitted successfully",
        "contact_id": "12345"
    }

def lambda_handler(event, context):
    return app.resolve(event, context)

What You Get Automatically
With this simple setup, Powertools automatically provides:

Request Validation: Invalid form data returns proper 422 errors
OpenAPI Schema: Generated schema shows form fields and types
Error Handling: Detailed validation errors for debugging
Content-Type Detection: Automatically parses application/x-www-form-urlencoded

Real-World Examples

User Registration Form:

from typing import Annotated, Optional
from aws_lambda_powertools.event_handler.openapi.params import Form
from pydantic import EmailStr

@app.post("/register")
def register_user(
    username: Annotated[str, Form(min_length=3, max_length=20)],
    email: Annotated[EmailStr, Form()],
    password: Annotated[str, Form(min_length=8)],
    newsletter: Annotated[bool, Form()] = False,
    referral_code: Annotated[Optional[str], Form()] = None
):
    """User registration endpoint."""

    # Automatic validation ensures:
    # - username is 3-20 characters
    # - email is valid format
    # - password is at least 8 characters
    # - newsletter is boolean
    # - referral_code is optional

    return {"user_id": "user_123", "status": "registered"}

Survey/Feedback Form:

from typing import List
from enum import Enum

class SatisfactionLevel(str, Enum):
    VERY_SATISFIED = "very_satisfied"
    SATISFIED = "satisfied" 
    NEUTRAL = "neutral"
    DISSATISFIED = "dissatisfied"
    VERY_DISSATISFIED = "very_dissatisfied"

@app.post("/feedback")
def submit_feedback(
    overall_satisfaction: Annotated[SatisfactionLevel, Form()],
    product_rating: Annotated[int, Form(ge=1, le=5, description="Rating from 1-5")],
    comments: Annotated[str, Form(max_length=1000)],
    recommend: Annotated[bool, Form(description="Would you recommend us?")],
    improvements: Annotated[Optional[str], Form()] = None
):
    """Process customer feedback survey."""

    return {
        "feedback_id": "fb_456",
        "thank_you_message": "Thank you for your feedback!"
    }

Advanced Features

Custom Validation
You can add custom validation using Pydantic validators:

from pydantic import field_validator

class ContactRequest(BaseModel):
    name: Annotated[str, Form()]
    email: Annotated[str, Form()]
    phone: Annotated[str, Form()]

    @field_validator('phone')
    def validate_phone(cls, v):
        # Custom phone validation logic
        if not re.match(r'^\+?[\d\s-()]+$', v):
            raise ValueError('Invalid phone number format')
        return v

@app.post("/contact-advanced")
def advanced_contact(contact: ContactRequest):
    return {"status": "received", "contact_id": "contact_789"}

Error Handling
Form validation errors are automatically formatted:

# When invalid data is sent, you get detailed error responses:
{
    "detail": [
        {
            "type": "string_too_short",
            "loc": ["body", "username"],
            "msg": "String should have at least 3 characters",
            "input": "ab"
        },
        {
            "type": "value_error",
            "loc": ["body", "email"],
            "msg": "Invalid email format"
        }
    ]
}

Testing Your Form Endpoints

Using curl:
# Test the contact form curl -X POST https://your-api.execute-api.region.amazonaws.com/contact \ -H "Content-Type: application/x-www-form-urlencoded" \ -d "name=John Doe&email=john@example.com&message=Hello World"

Using Python requests:

import requests

response = requests.post(
    'https://your-api.execute-api.region.amazonaws.com/contact',
    data={
        'name': 'Jane Smith',
        'email': 'jane@example.com', 
        'message': 'Great service!'
    }
)

print(response.json())

HTML Form:

<form action="https://your-api.execute-api.region.amazonaws.com/contact" method="POST">
    <input type="text" name="name" required>
    <input type="email" name="email" required>
    <textarea name="message" required></textarea>
    <button type="submit">Send Message</button>
</form>

OpenAPI Documentation

Your form endpoints automatically generate proper OpenAPI documentation. Here's what the generated schema looks like:

paths:
  /contact:
    post:
      requestBody:
        required: true
        content:
          application/x-www-form-urlencoded:
            schema:
              type: object
              properties:
                name:
                  type: string
                  description: "Contact's full name"
                email:
                  type: string
                  description: "Contact's email address"
                message:
                  type: string
                  description: "Contact message"
              required: ["name", "email", "message"]

Integration with Swagger UI
Enable Swagger UI to get an interactive API explorer:

app = APIGatewayRestResolver(enable_validation=True)
app.enable_swagger(
    path="/docs",
    title="My Contact API",
    version="1.0.0"
)

Deployment Best Practices

Environment Configuration

import os
from aws_lambda_powertools import Logger, Tracer, Metrics

logger = Logger()
tracer = Tracer()
metrics = Metrics()

# Environment-specific settings
DEBUG = os.getenv('DEBUG', 'false').lower() == 'true'
MAX_MESSAGE_LENGTH = int(os.getenv('MAX_MESSAGE_LENGTH', '1000'))

app = APIGatewayRestResolver(
    enable_validation=True,
    debug=DEBUG
)

SAM Template:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  ContactFormFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: python3.11
      Environment:
        Variables:
          POWERTOOLS_SERVICE_NAME: contact-api
          MAX_MESSAGE_LENGTH: 2000
      Events:
        ContactForm:
          Type: Api
          Properties:
            Path: /contact
            Method: post

Monitoring and Observability

Powertools automatically provides structured logging:

@app.post("/contact")
@tracer.capture_method
def submit_contact_form(name: Annotated[str, Form()], email: Annotated[str, Form()]):

    # Add custom metrics
    metrics.add_metric(name="ContactFormSubmission", unit="Count", value=1)
    metrics.add_metadata(key="email_domain", value=email.split('@')[1])

    # Structured logging
    logger.info("Contact form submitted", extra={
        "name": name,
        "email_domain": email.split('@')[1]
    })

    return {"status": "success"}

Security Considerations

Input Sanitization

import html

@app.post("/comment")
def submit_comment(
    content: Annotated[str, Form(max_length=500)]
):
    # Sanitize HTML content
    clean_content = html.escape(content)

    # Additional sanitization as needed
    return {"comment_id": "comment_123"}

Rate Limiting
Use AWS API Gateway throttling or implement custom rate limiting:

from functools import wraps
import time

def rate_limit(max_requests_per_minute=10):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            # Implement rate limiting logic
            return func(*args, **kwargs)
        return wrapper
    return decorator

@app.post("/contact")
@rate_limit(max_requests_per_minute=5)
def submit_contact_form(name: Annotated[str, Form()]):
    return {"status": "received"}

Performance Tips

Use appropriate field constraints to validate data early
Enable response compression for large form responses
Implement caching for expensive validation operations
Use async patterns for external API calls during form processing

Additional Resources:
AWS Lambda Powertools Documentation: https://docs.powertools.aws.dev/lambda/python/latest/

Example Applications Repository: https://github.com/aws-powertools/powertools-lambda-python/tree/develop/examples

Ready to build better form-handling Lambda APIs? Try out the new Form parameter support and let me know how it works for your use cases!

Mounting Amazon EFS Across 3 Regions (Kubernetes + EC2): Work-arounds, Tweaks & Startup Automation

Michael Uanikehi — Tue, 22 Jul 2025 22:11:40 +0000

Mounting Amazon EFS across multiple AWS regions is not something you do every day but when you need to, the pain becomes real. In this article, I’ll walk through how I achieved cross-region EFS mounting from three AWS regions into Kubernetes (EKS) and EC2-based deployment. We’ll cover the architecture, common pitfalls, and practical work-arounds for both environments.

EFS DNS is regional. Each mount helper expects the region-specific hostname (e.g., fs-1234.efs.us-east-1.amazonaws.com). When you point that hostname at a mount target in a different region, the helper often fails especially inside Kubernetes because it does a DNS check and won’t trust /etc/hosts overrides.

Why Cross-Region EFS Mounting?

We manage workloads that span multiple AWS regions to support high availability and global financial clients. These workloads rely on shared file systems, and Amazon EFS was our go-to choice. However, EFS is not designed for seamless cross-region mounting. We needed a way to mount:

This was technically possible but full of edge cases.

The Architecture:
-EFS volumes in three regions (us-east-1,eu-west-1, andap-southeast-1)
-EKS cluster and EC2 in us-east-1
-VPC peering between regions (NFS port 2049 open)
-Mount helper: amazon-efs-utils

Common Problems

amazon-efs-utils requires AWS-provided DNS names, not custom hostnames or CNAMEs
When used inside Kubernetes, DNS resolution often fails or defaults to 127.0.0.1
Even when using hostAliases in pods, the mount helper doesn’t always respect it
IAM role mismatch between pod and node leads to permission errors

Kubernetes (EKS) Approach

Key Steps

Install amazon-efs-utils in an init container (or bake it into the image).
Resolve the mount-target IPs for each region.
Add IP/hostname pairs to /etc/hosts via hostAliases or an init container.
Mount with tls,iam,region=<SOURCE_REGION> options.

env:
  - name: AWS_REGION
    value: "us-east-1"
hostAliases:
  - ip: "10.94.117.128"
    hostnames:
      - "{efs id}.efs.us-east-1.amazonaws.com"
  - ip: "10.94.125.126"
    hostnames:
      - "{efs id}.efs.ap-southeast-1.amazonaws.com"
  - ip: "10.94.109.68"
    hostnames:
      - "{efs id}.efs.eu-west-1.amazonaws.com"
args:
yum install -y amazon-efs-utils
# Mount each EFS
mount -t efs -o tls,iam,region=us-east-1 ${EFS_ID}:/ /mnt/efs-east
mount -t efs -o tls,iam,region=ap-southeast-1 ${EFS_ID}:/ /mnt/efs-ap
mount -t efs -o tls,iam,region=eu-west-1 ${EFS_ID}:/ /mnt/efs-eu

Challenges:
Mount helper may still ignore /etc/hosts
Pod IAM must allow: elasticfilesystem:ClientMount + ClientWrite
You can only mount if IP is reachable from the current AZ/subnet

EC2 Approach

Much easier than EKS thanks to direct /etc/hosts control.
User Data Script:

#!/bin/bash
yum install -y amazon-efs-utils

# Resolve EFS hostnames manually
echo "10.00.111.100 ${EFS_ID}.efs.us-east-1.amazonaws.com" >> /etc/hosts
echo "10.00.111.101 ${EFS_ID}.efs.ap-southeast-1.amazonaws.com" >> /etc/hosts
echo "10.00.111.102 ${EFS_ID}.efs.eu-west-1.amazonaws.com" >> /etc/hosts

# Create mount points
mkdir -p /mnt/efs-east /mnt/efs-ap /mnt/efs-eu

# Mount each EFS
mount -t efs -o tls,iam,region=us-east-1 ${EFS_ID}:/ /mnt/efs-east
mount -t efs -o tls,iam,region=ap-southeast-1 ${EFS_ID}:/ /mnt/efs-ap
mount -t efs -o tls,iam,region=eu-west-1 ${EFS_ID}:/ /mnt/efs-eu

Lessons Learned

Only one mount target IP per hostname works
If the IP is unreachable, mount fails completely, even if DNS resolves
IAM must be correct on every node or pod
EKS needs extra care due to kube-dns + IAM

Recommendations

Use EC2 where possible if reliability matters
In EKS, use hostAliases but always test per AZ
Consider building a helper script or sidecar to handle resolution dynamically
Use region-specific mount options (e.g. -o tls,iam,region=eu-west-1)
Stay close to AWS guidance like this one

Bonus: Automate It

You can automate:

IP resolution via boto3
IAM role patching
Host file injection via DaemonSet
Mount validation in init containers

Final Thoughts

Cross-region EFS mounting does work — but it’s fragile. Knowing how DNS, IPs, IAM, and Linux internals interact is key to making it reliable. If you’ve ever fought 127.0.0.1 DNS resolution in a pod, or had a mount fail mysteriously in one AZ but not another, this article is for you.

DEV Community: Michael Uanikehi

Handling File Uploads in AWS Lambda with Powertools OpenAPI (From Limitation to Production Feature)

Introduction

The Problem: File Uploads Break the Abstraction

The Goal: Make File Uploads First-Class

The Solution: File() Parameter Support

Two Ways to Work with Files

1. Raw bytes

2. Rich file object

Combining Files with Form Data

What Changed Under the Hood

API Gateway Gotcha (Important)

Before vs After

Before

After

Why This Matters

Open Source Insight

Final Thoughts

References

How I Fixed an SSI-Breaking Bug in NGINX Gateway Fabric

Measuring What Matters: Rethinking Serverless Workflows with AWS Lambda Durable Functions

The Problem: Orchestration vs Understanding

What Durable Functions Change

Why This Matters for Observability

Measuring What Actually Matters

Durable Functions vs Step Functions (Practical View)

Trade-offs (Important)

Troubleshooting EFS Mount Failures in EKS: The IAM Mount Option Mystery

TL;DR

The Problem

The Investigation Journey

Initial Suspicions (All Wrong)

The Eureka Moment

Root Cause Analysis

1. EFS File System Policy Conditions

2. Missing IAM Mount Option

3. The EFS Mount Flow

The Solution

Fix 1: Added mountOptions: [tls, iam] to PersistentVolume

Fix 2: Use Only Supported EFS Condition Keys

Key Learnings

1. IAM Mount Option is Required for IAM Authorization

2. Not All IAM Conditions Work with EFS

3. Layer Your Security Properly

4. Read the Error Logs Carefully

5. Test Mount Operations Manually

Conclusion

Measuring What Matters: Adding Multiple Dimension Sets to AWS Lambda Powertools

The Real Problem: Aggregation, Not Instrumentation

The Feature Request That Captured the Pattern

From Feature Request to Production-Ready Implementation

What Changed Under the Hood

Why This Matters in Production

Open Source as Shared Problem-Solving

Measuring What Actually Matters

Securing Cross-Account AWS Operations: Adding External ID Support to CDK AwsCustomResource

Open Source as a Force Multiplier: How Small Fixes Scale Global Impact

Turn log lines into alerts (without building a whole observability stack)

Killing cold starts with Lambda SnapStart

What SnapStart actually does:

Final thoughts

How to Handle Form Data in AWS Lambda APIs with Powertools OpenAPI Support

Mounting Amazon EFS Across 3 Regions (Kubernetes + EC2): Work-arounds, Tweaks & Startup Automation

The Solution: `File()` Parameter Support

Fix 1: Added `mountOptions: [tls, iam]` to PersistentVolume