Every Lambda function I have written that touches S3 has the same three lines of plumbing:
s3.download_file(bucket, key, "/tmp/input.csv")
process("/tmp/input.csv", "/tmp/output.csv")
s3.upload_file("/tmp/output.csv", bucket, output_key)
Download. Process. Upload. Clean up /tmp. Handle the edge case where /tmp is full from a previous invocation. Handle the edge case where the download fails halfway. Handle the edge case where you run out of the 10 GB ephemeral limit because someone uploaded a file larger than you expected.
S3 Files makes all of that go away. You mount the bucket at /mnt/workspace and use open(). The file is right there. You write the output. It syncs to S3.
The Problem It Solves
Lambda functions that process files from S3 have always followed the same ritual:
- Download the object from S3 to
/tmp - Process it with whatever tool expects a file path
- Upload the result back to S3
- Clean up
/tmpso the next invocation doesn't run out of space
This works. It also creates problems.
/tmp is ephemeral. It's limited to 10 GB. It's not shared between invocations on different execution environments. If your function fails halfway through processing, you retry the entire download. If multiple functions need the same reference file, each one downloads its own copy.
For a single CSV transform, the overhead is tolerable. For a pipeline that processes PDFs, images, video, or runs tools like ffmpeg, imagemagick, trivy, or semgrep, the download-process-upload loop becomes the majority of your code and the majority of your execution time.
S3 Files eliminates the loop. Your function mounts the bucket and reads files directly.
How It Works
S3 Files is a managed NFS v4.1+ file system built on Amazon EFS that presents your S3 bucket as a directory tree. When you mount it on a Lambda function, the function sees files and directories at /mnt/your-path. Under the hood, the data still lives in S3.
The architecture uses what AWS calls "stage and commit":
- Your function reads and writes files through the NFS mount
- An EFS caching layer stores actively accessed data for low-latency access (~1ms)
- Changes written through the mount are exported back to S3 within minutes
- Changes made directly through the S3 API appear in the file system within seconds (sometimes longer)
The two layers are explicitly separate. The file system side gives you NFS close-to-open consistency. The S3 side gives you standard strong consistency. Each preserves its own semantics.
For large sequential reads (1 MiB or larger), S3 Files bypasses the cache entirely and streams data directly from S3 using parallel GET requests. This means ML training data, large CSVs, media files, and Parquet datasets get full S3 throughput without paying the cache premium. Files smaller than 128 KB (configurable) are the ones that get stored on the high-performance layer for low-latency access.
What It Looks Like
The old way:
import boto3
import csv
import os
s3 = boto3.client("s3")
def handler(event, context):
bucket = event["bucket"]
key = event["key"]
output_key = key.replace("incoming/", "processed/")
s3.download_file(bucket, key, "/tmp/input.csv")
with open("/tmp/input.csv", newline="") as src, \
open("/tmp/output.csv", "w", newline="") as dst:
reader = csv.DictReader(src)
writer = csv.DictWriter(dst, fieldnames=["email", "account_id"])
writer.writeheader()
for row in reader:
writer.writerow({
"email": row["email"].strip().lower(),
"account_id": row["account_id"].strip()
})
s3.upload_file("/tmp/output.csv", bucket, output_key)
os.remove("/tmp/input.csv")
os.remove("/tmp/output.csv")
return {"output": output_key}
The new way:
import csv
from pathlib import Path
WORKSPACE = Path("/mnt/workspace")
def handler(event, context):
source = WORKSPACE / event["relative_input_path"]
target = WORKSPACE / event["relative_output_path"]
target.parent.mkdir(parents=True, exist_ok=True)
with source.open(newline="") as src, target.open("w", newline="") as dst:
reader = csv.DictReader(src)
writer = csv.DictWriter(dst, fieldnames=["email", "account_id"])
writer.writeheader()
for row in reader:
writer.writerow({
"email": row["email"].strip().lower(),
"account_id": row["account_id"].strip()
})
return {"output_path": str(target)}
No boto3. No temporary files. No cleanup. The output is written directly to the mounted S3 bucket and syncs back to S3 automatically.
The real win shows up when the processing step isn't a simple CSV transform. If your function shells out to git, ripgrep, ffmpeg, trivy, or any tool that expects a filesystem path, a mounted workspace is simpler than teaching every tool to speak S3.
The Setup
S3 Files on Lambda requires more infrastructure than a plain S3 trigger. Here's what you need:
Requirements:
- An S3 file system created on a general purpose bucket (S3 versioning must be enabled)
- Mount targets in the same VPC and Availability Zones as your Lambda function
- Security groups allowing NFS traffic on port 2049
- Lambda function connected to the VPC
- Execution role with
s3files:ClientMount(ands3files:ClientWritefor write access) -
s3:GetObjectands3:GetObjectVersionfor direct read optimization - Function memory set to 512 MB or higher (required for direct reads from S3)
The SAM template:
Resources:
ProcessingFunction:
Type: AWS::Serverless::Function
Properties:
FunctionName: FileProcessorFunction
CodeUri: ./src
Handler: index.handler
Runtime: python3.13
MemorySize: 512
Timeout: 300
VpcConfig:
SecurityGroupIds:
- !Ref LambdaSecurityGroup
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
FileSystemConfigs:
- Arn: !GetAtt S3FilesAccessPoint.Arn
LocalMountPath: /mnt/workspace
Policies:
- Statement:
- Effect: Allow
Action:
- s3files:ClientMount
- s3files:ClientWrite
Resource: "*"
- Effect: Allow
Action:
- s3:GetObject
- s3:GetObjectVersion
Resource: !Sub "arn:aws:s3:::${BucketName}/*"
The VPC requirement is the biggest change from a standard Lambda + S3 setup. If your function isn't already in a VPC, you need to add subnets, security groups, and NAT gateways (if the function also needs internet access). That's not trivial for existing deployments.
When to Use S3 Files vs. the Old Pattern
Use S3 Files when:
- Your function processes files with tools that expect filesystem paths (ffmpeg, imagemagick, PDF libraries, git, security scanners)
- Multiple Lambda functions need shared access to the same working directory
- You are tired of managing
/tmpsize limits and cleanup logic - The function reads large reference datasets that don't change between invocations
- You want to eliminate the download-process-upload ceremony
Keep using GetObject + /tmp when:
- Your function reads one object, transforms it in memory, and writes one object back
- The function is a simple event handler that processes JSON payloads
- You need the lowest possible cold start latency (VPC adds ~100-200ms)
- Your function doesn't need filesystem semantics at all
- The workload is latency-sensitive and can't tolerate the VPC mount dependency
The mental model is straightforward. If your code has download_file followed by upload_file and the processing step uses file paths, S3 Files removes that plumbing. If your code streams objects through memory without touching the filesystem, S3 Files adds complexity for no benefit.
What You Pay
S3 Files pricing has three layers on top of your existing S3 storage costs:
| Component | Rate | What triggers it |
|---|---|---|
| S3 Standard storage | ~$0.023/GB-month | All data in the bucket (unchanged) |
| High-performance cache | ~$0.30/GB-month | Only actively cached data, not the whole bucket |
| Data access (reads) | ~$0.03/GB | Small file reads from cache |
| Data access (writes) | ~$0.06/GB | Writes through the mount |
The critical detail: large sequential reads (1 MiB+) bypass the cache entirely and cost only standard S3 GET request pricing. No S3 Files surcharge.
Practical example: You have a 1 TB bucket. Your Lambda functions actively work with 50 GB of files through the mount. Most reads are large Parquet files.
| Component | Cost |
|---|---|
| S3 storage (1 TB) | $23.55 |
| Cache storage (50 GB active) | $15.00 |
| Data access (small file reads) | ~$0.50 |
| Data access (writes, 20 GB) | ~$1.20 |
| Total | ~$40/month |
The same 1 TB on EFS would cost ~$300/month. S3 Files costs a fraction because you only pay the cache premium on the active working set, not the entire dataset.
Small operations have metering minimums. Data access operations are metered at a minimum size (reported as 32 KB in early testing). Reading a 1-byte config file gets metered for more than 1 byte. For workloads with millions of tiny metadata-heavy operations, those minimums add up.
Things to Know Before You Build
The sync window isn't instant. Changes written through the mount are exported back to S3 within minutes. Changes made directly in S3 appear in the file system within seconds, but can take a minute or longer. If your downstream system polls S3 for new objects, account for this lag. There's no manual flush API.
Renames are expensive. S3 has no native rename. Renaming a file through the mount means copy + delete at the S3 layer. For a single file, fine. For a directory with 50,000 files, that's 50,000 copy-and-delete operations. Write final output paths directly. Don't use directory renames as workflow commits.
S3 versioning is required. You can't create an S3 file system on a bucket without versioning enabled. This increases storage costs from additional versions.
Glacier storage classes are incompatible. S3 Standard, Intelligent-Tiering, and Infrequent Access all work. Glacier does not.
No hard links. Symbolic links work. If your tool relies on hard links (some build systems and package managers do), it will break.
1,024-byte key length limit. Deeply nested directories with long filenames can hit this ceiling. Measure your path lengths before committing to a directory structure.
Conflicts: S3 wins. If the same file is modified through both the mount and the S3 API simultaneously, the S3 version is treated as the source of truth. The file system version goes to a lost+found directory. Pick one writer per path.
Custom S3 metadata isn't visible. If your application sets x-amz-meta headers through the S3 API, those values don't appear as extended attributes on mounted files. POSIX attributes only.
Cache expiration defaults to 30 days. Data stays in the high-performance layer for 30 days after last access. For batch workloads that touch files once, drop this to 1-2 days to reduce cache storage costs.
S3 Files vs. EFS vs. Mountpoint
Lambda already supported EFS mounts. And S3 Mountpoint exists for read-heavy workloads. Here's when each makes sense:
| If you need... | Use this | Why |
|---|---|---|
| File paths backed by S3 data | S3 Files | S3 stays the source of truth, cache only for active data |
| General shared POSIX storage independent of S3 | EFS | Mature, no sync lag, all data is hot |
| Read-only high-throughput access to S3 | Mountpoint for S3 | Simpler, no EFS layer, no write support needed |
| Enterprise NAS features (ONTAP, Windows) | FSx | Protocol-specific workloads |
The key difference between S3 Files and EFS: with EFS, you pay $0.30/GB for everything stored. With S3 Files, you pay $0.30/GB only for the active working set and $0.023/GB for everything else in S3. The cost advantage grows as total data increases relative to the active subset.
The key difference between S3 Files and Mountpoint: Mountpoint is a FUSE client with limited write support and no caching layer. S3 Files gives you full read-write NFS semantics with a managed cache. If you only need to read large files from S3, Mountpoint is simpler and cheaper.
A Practical Example: Image Processing Pipeline
A common Lambda pattern: S3 trigger fires when an image is uploaded, function generates thumbnails and optimized versions.
The old way requires downloading the source image, processing it with Pillow or ImageMagick, writing multiple outputs to /tmp, then uploading each one back to S3:
import boto3
from PIL import Image
import os
s3 = boto3.client("s3")
SIZES = {"thumb": (150, 150), "medium": (800, 600), "large": (1920, 1080)}
def handler(event, context):
bucket = event["Records"][0]["s3"]["bucket"]["name"]
key = event["Records"][0]["s3"]["object"]["key"]
s3.download_file(bucket, key, "/tmp/source.jpg")
img = Image.open("/tmp/source.jpg")
for name, size in SIZES.items():
resized = img.copy()
resized.thumbnail(size)
output_path = f"/tmp/{name}.jpg"
resized.save(output_path, "JPEG", quality=85)
s3.upload_file(output_path, bucket, f"processed/{name}/{key}")
os.remove(output_path)
os.remove("/tmp/source.jpg")
return {"processed": list(SIZES.keys())}
With S3 Files:
from PIL import Image
from pathlib import Path
WORKSPACE = Path("/mnt/workspace")
SIZES = {"thumb": (150, 150), "medium": (800, 600), "large": (1920, 1080)}
def handler(event, context):
key = event["Records"][0]["s3"]["object"]["key"]
source = WORKSPACE / key
img = Image.open(source)
for name, size in SIZES.items():
output = WORKSPACE / "processed" / name / key
output.parent.mkdir(parents=True, exist_ok=True)
resized = img.copy()
resized.thumbnail(size)
resized.save(output, "JPEG", quality=85)
return {"processed": list(SIZES.keys())}
Half the code. No boto3 import. No temporary file management. The source image is read directly from the mount. The outputs are written directly to the mount and sync to S3 within a minute.
Multi-Function Shared Workspace
The pattern that makes S3 Files most interesting isn't single-function file processing. It's multiple functions sharing a workspace.
Before S3 Files, if three Lambda functions needed to collaborate on the same set of files, each one had to download from S3, do its work, upload results, and the next function would download those results. With S3 Files, they all mount the same bucket and read each other's output directly.
Function A (security scan)
reads /mnt/workspace/repo/
writes /mnt/workspace/reports/security.json
Function B (test analysis)
reads /mnt/workspace/repo/
writes /mnt/workspace/reports/tests.json
Function C (merge reports)
reads /mnt/workspace/reports/*.json
writes /mnt/workspace/final/summary.md
No intermediate S3 uploads between steps. No coordination logic to pass object keys between functions. The workspace is the coordination mechanism.
The rule for shared workspaces: one writer per file path. Don't have two functions writing to the same file. Use worker-specific output paths and let the orchestrator merge.
Who This Is For
S3 Files is for Lambda functions that have been pretending S3 objects are files. If your code downloads an object, gives it a file path, processes it with a tool that expects a file, and uploads the result, S3 Files removes the pretending.
The strongest use cases:
Media processing. Image resizing, video transcoding, audio conversion. These tools all expect file paths.
Document processing. PDF extraction, Office document conversion, OCR pipelines. Libraries like pdfplumber, python-docx, and Tesseract work with files.
Code analysis. Security scanners, linters, dependency checkers. Tools like trivy, semgrep, bandit, and eslint expect a directory to scan.
ML inference with reference data. Models that load large reference files (embeddings, lookup tables, feature stores) benefit from the shared mount. Load once, use across invocations.
AI agent workspaces. Agents that use filesystem tools (cat, grep, ls, find) can work directly on S3 data without custom S3 API wrappers.
The weakest use cases: simple JSON transforms, single-object streaming, anything that never touches the filesystem.
Getting Started
# Create the file system on your bucket (verify exact CLI syntax against docs)
aws s3api create-file-system --bucket my-bucket --file-system-name my-workspace
# Create mount targets in your VPC subnets
aws s3api create-mount-target \
--file-system-id fs-abc123 \
--subnet-id subnet-xyz \
--security-groups sg-nfs-access
# Attach to your Lambda function via console or IaC
# Configuration > File systems > Add file system > S3 Files
The Lambda S3 Files documentation covers the full setup. The S3 Files user guide covers file system creation, access points, and synchronization configuration.
Should You Migrate?
For 20 years, the answer to "can I mount S3?" was no. Now it's yes, and the implementation is good enough for production Lambda workloads.
The download-process-upload pattern isn't gone from every codebase. It still makes sense for simple object transforms. But for file-heavy Lambda functions that spend more lines on S3 plumbing than on actual processing logic, S3 Files is a real simplification.
Mount the bucket. Read the file. Write the output.





Top comments (0)