DEV Community

Cover image for AWS S3 Files: The Missing Conversation
Dzhuneyt
Dzhuneyt

Posted on • Originally published at dzhuneyt.com

AWS S3 Files: The Missing Conversation

Amazon S3 Files launched on April 7, and the reaction was immediate — nearly universal excitement across the AWS engineering community. And honestly, the excitement is warranted.

Andrew Warfield, VP and Distinguished Engineer at Amazon, framed the vision clearly: "With Tables, Vectors, and now Files, we are consciously changing the surface of S3. It's not just objects — it's evolving to make sure you can work with your data however you need to." That's not a minor product update. That's a platform shift.

But as I scrolled through dozens of posts and articles, something stood out: almost nobody had actually tested it. The conversation was full of explainers and excitement, light on hands-on findings. So I spun up a throwaway AWS account, provisioned the infrastructure, and ran it through real file operations. Here's what the conversation is missing.


What S3 Files Actually Is

The short version: S3 Files adds an NFS 4.1/4.2 file system interface on top of your S3 buckets. You mount a bucket, and your tools — cat, ls, python open(), anything that speaks POSIX — can read and write to S3 data directly. Under the hood, it's built on Amazon EFS technology. Your data stays in S3. Actively used files get cached in a high-performance storage layer for low-latency access.

Luc van Donkersgoed, Principal Engineer at PostNL, raised the right question: "Does this finally settle the question 'is it a file or is it an object'?" The answer, as far as I can tell: it's both now, simultaneously, and that's the whole point.

S3 Files is the third leg of a deliberate strategy. S3 Tables gave you structured query access. S3 Vectors gave you embedding storage. S3 Files gives you POSIX. The pattern is clear — S3 is becoming a universal data substrate, not just an object store.

Understanding what it is on paper is straightforward. The interesting part is what happens when you actually use it.


What I Found When I Tested It

The EFS DNA Is More Than Cosmetic

The first thing you'll notice is that S3 Files doesn't just borrow from EFS conceptually — it's deeply woven into the EFS ecosystem.

The mount helper binary (mount.s3files) ships inside the amazon-efs-utils package. There's no separate amazon-s3-files-utils. The IAM trust principal you need in your role policy is elasticfilesystem.amazonaws.com — nothing with "s3files" in the name. When you inspect the mount, it shows up as 127.0.0.1:/ type nfs4 — the same local NFS proxy pattern that EFS uses.

This has practical implications. If you've operated EFS before, your mental model transfers directly: mount targets, security groups on port 2049, access points, the client permission model (elasticfilesystem:ClientMount, ClientWrite, ClientRootAccess). If you haven't, expect a learning curve that the launch announcement doesn't prepare you for.

The Performance Story Has Two Chapters

The headline numbers in the announcement — "multiple terabytes per second of aggregate read throughput" — are real, but they describe one specific scenario. The full picture is more nuanced.

My setup was minimal — a single t4g.micro ARM instance in us-east-1, mounting the file system over NFS in a single availability zone. Not a performance-optimized configuration by any means, and throughput on larger instances or multi-AZ setups would differ. But it gives you a baseline.

The first (cold) read of a 5MB file completed at 17-18 MB/s — respectable, but not the headline number. The second read of the same file: up to 3.1 GB/s across multiple runs — over 100x faster. The intelligent caching layer had kicked in.

Small file reads were consistently in the 2-6 millisecond range. Writing 100 small files took under a second.

The key mechanic here is the 128KB threshold (configurable). Files at or below this size get loaded into the high-performance storage layer with sub-millisecond to single-digit millisecond latencies. Files above it are streamed directly from S3 — fast in aggregate, but not cached.

This means performance depends entirely on your access pattern and whether the cache is warm. Repeated reads of working-set data will be blazing fast. One-off reads of large cold files will feel like S3 with extra steps. Neither is wrong — but the headline doesn't distinguish between them.

File System Semantics: Mostly Complete, With Gaps

I tested the operations you'd actually run in production:

Operation Result
Write, append Works
Rename (mv) Works — atomic
mkdir -p Works
chmod Works — POSIX permissions preserved
Symlinks Works
File locking (flock) Works
Hard links Does not work — "Too many links" error

Read-after-write consistency held in every test I ran. I wrote a file and immediately read it back — no stale data, no lag. This is explicitly documented as a guarantee, and my testing confirmed it.

A few less obvious details: symlinks do work at the filesystem level, but when they sync to S3, they become regular objects containing the target path — not true symlinks. There's also a hidden .s3files-lost+found-<fs-id> directory that appears at the mount root.

The Sync Story Is Asymmetric

S3 Files syncs bidirectionally between the filesystem and the S3 bucket. But the two directions behave very differently.

Filesystem → S3 is fast. A single file written through the mount appeared as an S3 object within 1-2 seconds. A batch of 5 files written in quick succession took about a minute to fully sync — suggesting some batching or queuing under the hood, though a larger sample would be needed to characterize this precisely.

S3 → Filesystem is slower. A file uploaded directly to S3 via the API took roughly 45 seconds to appear on the mounted filesystem. This is the direction that matters if you have external processes writing to S3 and expect the mounted view to reflect those changes promptly.

Both directions are automatic and require no configuration. But if your architecture relies on S3 event notifications firing immediately after a filesystem write, or on the mounted view reflecting S3 uploads in near-real-time, you need to account for these delays.

Container Integration Is Seamless

One finding that pleasantly surprised me: Docker containers can access the S3 Files mount via a simple bind mount. No FUSE configuration, no special container capabilities, no sidecar.

I tested Alpine containers reading S3 data, writing files back, and a Python container running json.load() directly on a JSON file stored in S3 — all through a straightforward -v /mnt/s3files:/data bind mount. Writes from inside the container were immediately visible on the host and synced to S3 within seconds.

This is a meaningful practical advantage over Mountpoint for S3, which requires FUSE support inside the container. With S3 Files, the NFS mount on the host is transparent to the container — it just sees a normal directory.


What's Not Ready Yet

Versioning is mandatory. Your S3 bucket must have versioning enabled before you can create a file system on it. The announcement doesn't mention this. I discovered it when my create-file-system call failed with a validation error. For large, high-churn buckets, mandatory versioning has real cost and lifecycle implications that deserve consideration.

No Infrastructure as Code support at launch. There's no CloudFormation resource type, no CDK construct, no Terraform resource. This will almost certainly come soon, but if you're evaluating today, you're working with the CLI, the SDK, or rolling your own custom resources. Something to keep in mind, not a dealbreaker.

The IAM setup is non-obvious. The trust policy uses EFS service principals (elasticfilesystem.amazonaws.com) with S3 Files-specific conditions (arn:aws:s3files:REGION:ACCOUNT:file-system/*). Get this wrong, and the file system silently enters an "error" state — there's no clear failure at creation time. It reports "creating" and then, minutes later, quietly flips to "error" with an access denied message. This cost me real debugging time, and it's the kind of operational sharp edge worth knowing about upfront.


Where This Actually Fits

To ground this, here's how S3 Files compares to the alternatives:

Mountpoint for S3 S3 Files EFS
Write support Append-only Full Full
Caching None Intelligent (128KB threshold) Full
Protocol FUSE NFS 4.1/4.2 NFS 4.1
File locking No Yes Yes
Data location S3 S3 EFS storage
Container support Needs FUSE in container Bind mount Bind mount
IaC support Yes Not yet Yes

The bigger picture is the one Warfield articulated. S3 is systematically absorbing the interfaces that used to require separate services. The file-versus-object debate may genuinely be winding down — not because one paradigm won, but because the boundary between them is dissolving.

The product is impressive, and the strategic direction is clear. But the operational tooling — IaC, error reporting, observability — hasn't caught up with the feature yet. If you're evaluating S3 Files today, it's worth testing, worth understanding, and worth giving a bit of time to mature before putting it on a critical path.

Top comments (0)