“Mask what you must, without touching the source.”
Most modern applications often store user data in Amazon S3, including potentially sensitive details like names, emails, or contact information. But what if you could automatically hide the private data only when it's accessed, without ever touching the original file?
That's where Amazon S3 Object Lambda comes in.
In this post, I'll show you how to build a simple PII (Personal Identifiable Information) masker using the AWS CDK, S3 Access Points & Lambda Function that redacts email addresses from .txt files in real time.
Why On-the-Fly Transformation?
Traditionally, redacting sensitive information in S3 means:
- Modifying the original file to sanitize information
- Storing the sanitized copies
- Managing multiple versions of the same data
But this approach can be clunky and error-prone sometimes.
With S3 Object Lambda, you can intercept the S3 GET requests and transform data on the fly without duplicating the source files. It is a serverless middleware for S3 objects.
What is an S3 Access Point?
An S3 Access Point is like a custom entry point to your S3 bucket. Instead of relying on one big, complicated bucket policy, you can create several of these entry points, each with its own name and set of rules. This lets you control who gets in and what they can access more easily.
For example, you might have:
- One access point for your internal team,
- Another one that allows the public only to read files,
- And a third one that's mainly for use only by a Lambda function.
What is an S3 Object Lambda?
An S3 Object Lambda lets you customize or transform data on the fly as retrieved from S3. It sits between the user/client and the original object, invoking a Lambda function to tweak the data (masking sensitive info, reformatting content, or filtering fields) before it's returned.
It's an intelligent filter that processes the file just before it reaches the user. Some common ways it's used include:
- Removing sensitive information like credit card numbers, email addresses, or personal information.
- Creating content on the fly, such as resizing images or adding watermarks right before download.
- Changing file formats as needed, for example, converting a CSV file into JSON without storing both versions.
Hands-On: On-the-Fly PII Redactor
Let's get our hands dirty and build something real: a PII redactor that automatically masks email addresses from text files stored in S3 without changing the original files.
You can follow along or clone the complete working code from my GitHub repo:
PII Redaction with Object Lambda
We'll use AWS CDK to provision the entire setup:
- S3 bucket to store raw sensitive data
- Standard S3 Access Point
- Lambda function that performs redaction
- Object Lambda Access Point that triggers the Lambda on reads
1. Creating the Raw Data Bucket
We create an S3 bucket configured with autoDeleteObjects: true and removalPolicy: DESTROY, meaning the bucket and its contents will be automatically removed when the stack is destroyed, ideal for dev or test environments.
const bucket = new Bucket(this, "RawDataBucket", {
bucketName: "raw-bucket-with-sensitive-data",
removalPolicy: RemovalPolicy.DESTROY,
autoDeleteObjects: true,
});
2.Creating a Standard S3 Access Point
We create an access point that provides a secure and controlled interface for Lambda to read files from the raw data bucket. The ARN created here will be used later to indirectly connect the Object Lambda Access Point to the bucket.
const accessPoint = new CfnS3AccessPoint(this, "S3AccessPoint", {
bucket: "raw-bucket-with-sensitive-data",
name: "bucket-access-point",
});
const s3AccessPointArn = `arn:aws:s3:${this.region}:${this.account}:accesspoint/bucket-access-point`;
3. Creating the Lambda Function
We create a function invoked by Object Lambda to mask/redact data. The lambda code will be written in Python and handler is index.lambda_handler
const lambdaFn = new Function(this, "PiiMaskerLambda", {
runtime: Runtime.PYTHON_3_9,
functionName: "maskerLambdaFunction",
code: Code.fromAsset("lambda"),
handler: "index.lambda_handler",
environment: {
RAW_BUCKET_NAME: "raw-bucket-with-sensitive-data",
},
logRetention: RetentionDays.ONE_WEEK,
});
4. Granting Access Permissions
Lambda permission to:
- Grant s3:GetObject access only via the Access Point, not the entire bucket
- Ensures Lambda can retrieve data without exposing raw bucket permissions
lambdaFn.addToRolePolicy(
new PolicyStatement({
actions: ["s3:GetObject"],
resources: [`${s3AccessPointArn}/object/*`],
})
);
bucket.grantRead(lambdaFn);
5. Creating the Object Lambda Access Point
We then configure an Object Lambda Access Point, which invokes the Lambda during object retrieval, transforming the output. This is the gateway that clients will call instead of the bucket. It's linked to:
- The regular S3 Access Point
- The Lambda function (for transformation)
const objectLambdaAccessPoint = new CfnS3ObjectLambdaAccessPoint(
this,
"S3ObjectLambdaAccessPoint",
{
name: "object-lambda-access-point",
objectLambdaConfiguration: {
supportingAccessPoint: s3AccessPointArn,
transformationConfigurations: [
{
actions: ["GetObject"],
contentTransformation: {
AwsLambda: {
FunctionArn: lambdaFn.functionArn,
},
},
},
],
},
}
6. Allowing Object Lambda to Use Lambda
We explicitly allow the S3 Object Lambda service to:
- Invoke our redaction Lambda
- Write transformed responses using s3-object-lambda:WriteGetObjectResponse
/**
* Allow the S3 Object Lambda service to invoke the Lambda function
*/
lambdaFn.addPermission("AllowObjectLambdaInvoke", {
principal: new ServicePrincipal("s3-object-lambda.amazonaws.com"),
sourceArn: objectLambdaAccessPointArn,
});
/**
* Allow Lambda to return a transformed version of the object
*/
lambdaFn.addToRolePolicy(
new PolicyStatement({
actions: ["s3-object-lambda:WriteGetObjectResponse"],
resources: [objectLambdaAccessPointArn],
})
);
7. Lambda Handler (Python)
Below is the Python code to redact the sensitive email addresses from the text files.
import os
import re
import boto3
from urllib.parse import urlparse
s3 = boto3.client("s3")
# Regex to match emails
EMAIL_PATTERN = re.compile(r"[a-zA-Z0-9.+_-]+@[a-zA-Z0-9._-]+\.[a-zA-Z]+")
def lambda_handler(event, context):
# Determine the requested key
requested_url = event["userRequest"]["url"]
key = urlparse(requested_url).path.lstrip("/")
# Fetch the raw object
bucket = os.environ["RAW_BUCKET_NAME"]
resp = s3.get_object(Bucket=bucket, Key=key)
body = resp["Body"].read()
# Only redact emails in .txt files
if key.lower().endswith(".txt"):
try:
text = body.decode("utf-8")
redacted = EMAIL_PATTERN.sub("[REDACTED_EMAIL]", text)
output = redacted.encode("utf-8")
print(f"Redacted emails in {key}")
except Exception:
output = body
else:
output = body
# Return to the caller
s3.write_get_object_response(
Body=output,
RequestRoute=event["getObjectContext"]["outputRoute"],
RequestToken=event["getObjectContext"]["outputToken"],
)
return {"status_code": 200}
Testing via AWS CLI
- Upload a .txt file containing emails to your S3 bucket.
aws s3 cp s3://raw-bucket-with-sensitive-data/test.txt <path_to_local_txt_file>
- Get Object via Object Lambda Access Point
aws s3api get-object \
--bucket arn:aws:s3-object-lambda:<region>:<account>:accesspoint/object-lambda-access-point \
--key test.txt redacted-output.txt
- Replace and with your AWS values.
Output: The emails will be replaced with [REDACTED_EMAIL].
Don't Forget to Clean Up! 🚨
Once you've finished testing the redaction flow, delete the resources you provisioned to avoid incurring charges. You can destroy everything with a single command:
cdk destroy
This removes:
- The S3 bucket and objects (if autoDeleteObjects: true)
- Lambda function and associated roles
- S3 Access Point
- S3 Object Lambda Access Point
- IAM policies
⚠️ Note: Since we are using CDK and logRetention, the log events inside CloudWatch log groups will expire automatically after the retention period (7 days). However, the log groups themselves are not deleted by default. To clean them up completely, you can use AWS CLI / AWS Console to delete all log groups with a specific project tag.
Top comments (0)