Anish Ummenthala

Posted on Jun 29 • Originally published at Medium

Redacting Sensitive Data On-the-Fly with S3 Object Lambda

#aws #lambda #s3 #security

“Mask what you must, without touching the source.”

Most modern applications often store user data in Amazon S3, including potentially sensitive details like names, emails, or contact information. But what if you could automatically hide the private data only when it's accessed, without ever touching the original file?

That's where Amazon S3 Object Lambda comes in.

In this post, I'll show you how to build a simple PII (Personal Identifiable Information) masker using the AWS CDK, S3 Access Points & Lambda Function that redacts email addresses from .txt files in real time.

Why On-the-Fly Transformation?

Traditionally, redacting sensitive information in S3 means:

Modifying the original file to sanitize information
Storing the sanitized copies
Managing multiple versions of the same data

But this approach can be clunky and error-prone sometimes.

With S3 Object Lambda, you can intercept the S3 GET requests and transform data on the fly without duplicating the source files. It is a serverless middleware for S3 objects.

What is an S3 Access Point?

An S3 Access Point is like a custom entry point to your S3 bucket. Instead of relying on one big, complicated bucket policy, you can create several of these entry points, each with its own name and set of rules. This lets you control who gets in and what they can access more easily.

For example, you might have:

One access point for your internal team,
Another one that allows the public only to read files,
And a third one that's mainly for use only by a Lambda function.

What is an S3 Object Lambda?

An S3 Object Lambda lets you customize or transform data on the fly as retrieved from S3. It sits between the user/client and the original object, invoking a Lambda function to tweak the data (masking sensitive info, reformatting content, or filtering fields) before it's returned.

It's an intelligent filter that processes the file just before it reaches the user. Some common ways it's used include:

Removing sensitive information like credit card numbers, email addresses, or personal information.
Creating content on the fly, such as resizing images or adding watermarks right before download.
Changing file formats as needed, for example, converting a CSV file into JSON without storing both versions.

Hands-On: On-the-Fly PII Redactor

Let's get our hands dirty and build something real: a PII redactor that automatically masks email addresses from text files stored in S3 without changing the original files.

You can follow along or clone the complete working code from my GitHub repo:
PII Redaction with Object Lambda

We'll use AWS CDK to provision the entire setup:

S3 bucket to store raw sensitive data
Standard S3 Access Point
Lambda function that performs redaction
Object Lambda Access Point that triggers the Lambda on reads

1. Creating the Raw Data Bucket

We create an S3 bucket configured with autoDeleteObjects: true and removalPolicy: DESTROY, meaning the bucket and its contents will be automatically removed when the stack is destroyed, ideal for dev or test environments.

const bucket = new Bucket(this, "RawDataBucket", {
  bucketName: "raw-bucket-with-sensitive-data",
  removalPolicy: RemovalPolicy.DESTROY,
  autoDeleteObjects: true,
});

2.Creating a Standard S3 Access Point

We create an access point that provides a secure and controlled interface for Lambda to read files from the raw data bucket. The ARN created here will be used later to indirectly connect the Object Lambda Access Point to the bucket.

const accessPoint = new CfnS3AccessPoint(this, "S3AccessPoint", {
  bucket: "raw-bucket-with-sensitive-data",
  name: "bucket-access-point",
});

const s3AccessPointArn = `arn:aws:s3:${this.region}:${this.account}:accesspoint/bucket-access-point`;

3. Creating the Lambda Function

We create a function invoked by Object Lambda to mask/redact data. The lambda code will be written in Python and handler is index.lambda_handler

const lambdaFn = new Function(this, "PiiMaskerLambda", {
  runtime: Runtime.PYTHON_3_9,
  functionName: "maskerLambdaFunction",
  code: Code.fromAsset("lambda"),
  handler: "index.lambda_handler",
  environment: {
    RAW_BUCKET_NAME: "raw-bucket-with-sensitive-data",
  },
  logRetention: RetentionDays.ONE_WEEK,
});

4. Granting Access Permissions

Lambda permission to:

Grant s3:GetObject access only via the Access Point, not the entire bucket
Ensures Lambda can retrieve data without exposing raw bucket permissions

lambdaFn.addToRolePolicy(
  new PolicyStatement({
    actions: ["s3:GetObject"],
    resources: [`${s3AccessPointArn}/object/*`],
  })
);

bucket.grantRead(lambdaFn);

5. Creating the Object Lambda Access Point

We then configure an Object Lambda Access Point, which invokes the Lambda during object retrieval, transforming the output. This is the gateway that clients will call instead of the bucket. It's linked to:

The regular S3 Access Point
The Lambda function (for transformation)

const objectLambdaAccessPoint = new CfnS3ObjectLambdaAccessPoint(
  this,
  "S3ObjectLambdaAccessPoint",
  {
    name: "object-lambda-access-point",
    objectLambdaConfiguration: {
      supportingAccessPoint: s3AccessPointArn,
      transformationConfigurations: [
        {
          actions: ["GetObject"],
          contentTransformation: {
            AwsLambda: {
              FunctionArn: lambdaFn.functionArn,
            },
          },
        },
      ],
    },
  }

6. Allowing Object Lambda to Use Lambda

We explicitly allow the S3 Object Lambda service to:

Invoke our redaction Lambda
Write transformed responses using s3-object-lambda:WriteGetObjectResponse

/**
 * Allow the S3 Object Lambda service to invoke the Lambda function
 */
lambdaFn.addPermission("AllowObjectLambdaInvoke", {
  principal: new ServicePrincipal("s3-object-lambda.amazonaws.com"),
  sourceArn: objectLambdaAccessPointArn,
});

/**
 * Allow Lambda to return a transformed version of the object
 */
lambdaFn.addToRolePolicy(
  new PolicyStatement({
    actions: ["s3-object-lambda:WriteGetObjectResponse"],
    resources: [objectLambdaAccessPointArn],
  })
);

7. Lambda Handler (Python)

Below is the Python code to redact the sensitive email addresses from the text files.

import os
import re
import boto3
from urllib.parse import urlparse

s3 = boto3.client("s3")

# Regex to match emails
EMAIL_PATTERN = re.compile(r"[a-zA-Z0-9.+_-]+@[a-zA-Z0-9._-]+\.[a-zA-Z]+")

def lambda_handler(event, context):
    # Determine the requested key
    requested_url = event["userRequest"]["url"]
    key = urlparse(requested_url).path.lstrip("/")

    # Fetch the raw object
    bucket = os.environ["RAW_BUCKET_NAME"]
    resp = s3.get_object(Bucket=bucket, Key=key)
    body = resp["Body"].read()

    # Only redact emails in .txt files
    if key.lower().endswith(".txt"):
        try:
            text = body.decode("utf-8")
            redacted = EMAIL_PATTERN.sub("[REDACTED_EMAIL]", text)
            output = redacted.encode("utf-8")
            print(f"Redacted emails in {key}")
        except Exception:
            output = body
    else:
        output = body

    # Return to the caller
    s3.write_get_object_response(
        Body=output,
        RequestRoute=event["getObjectContext"]["outputRoute"],
        RequestToken=event["getObjectContext"]["outputToken"],
    )
    return {"status_code": 200}

Testing via AWS CLI

Upload a .txt file containing emails to your S3 bucket.

aws s3 cp s3://raw-bucket-with-sensitive-data/test.txt <path_to_local_txt_file>

Get Object via Object Lambda Access Point

aws s3api get-object \
  --bucket arn:aws:s3-object-lambda:<region>:<account>:accesspoint/object-lambda-access-point \
  --key test.txt redacted-output.txt

Replace and with your AWS values.

Output: The emails will be replaced with [REDACTED_EMAIL].

Don't Forget to Clean Up! 🚨

Once you've finished testing the redaction flow, delete the resources you provisioned to avoid incurring charges. You can destroy everything with a single command:

cdk destroy

This removes:

The S3 bucket and objects (if autoDeleteObjects: true)
Lambda function and associated roles
S3 Access Point
S3 Object Lambda Access Point
IAM policies

⚠️ Note: Since we are using CDK and logRetention, the log events inside CloudWatch log groups will expire automatically after the retention period (7 days). However, the log groups themselves are not deleted by default. To clean them up completely, you can use AWS CLI / AWS Console to delete all log groups with a specific project tag.