Ifeanyi O. for AWS

Posted on Apr 9

Amazon S3 Files Is Still Not A File System

#ai #aws #cloud #architecture

Intro
What is S3 Files
What Changed for Developers
How S3 Files Works at a High Level
Why This Matters for AI
Practical Code Example
Security Operations
Key Takeaways

Intro

If you've spent any real time learning AWS, one of the first storage lessons you probably learned was Amazon S3 is not a file system.

And that mattered because it explained a lot of the weirdness you'd run into early on that might have you asking...

Why can't I just mount S3 like a normal shared drive?
Why can't my application treat S3 data like regular files?

But the answer has always been... S3 is object storage and even though that's obvious, it's a critical distinction because as interesting as S3 Files is, it seems to be confusing a lot of people that only read headlines.

So no... S3 did not suddenly turn into a file system.

So what is S3 Files?

S3 Files is a file system interface for data that lives in S3 and connects AWS compute resources directly with data in S3. It gives applications file system access without the data ever leaving S3 and maintains a view of the objects in the bucket while translating file system operations into efficient S3 requests.

Personally, the simple way to think about S3 Files is S3 remains the storage layer, S3 Files becomes the file-access layer and your apps sees files and directories.

What changed for developers

Before S3 Files, many workloads had a choice to make.

Do you optimize for S3's scalability, durability and cost characteristics, or do you optimize for file system behavior that your applications already understand?

S3 Files aims to remove that tradeoff by making general purpose S3 buckets accessible as native file systems for AWS compute resources and supports NFS v4.1+ file operations like create, read, update and delete.

This is kind of a big deal because it means more workloads can work against data where it already lives instead of forcing you to move it somewhere else first.

How S3 Files works at a high level

At a high level, S3 Files creates a file system view over an S3 bucket.

Your compute resource connects to that file system through a mount target in your VPC. For example, you can create the S3 file system, discover the mount target and then mount it from an EC2 instance using the s3files file system type.

Once mounted, your application can interact with the mounted path using normal file operations.

sudo mkdir /home/ec2-user/s3files
sudo mount -t s3files fs-1234567890abcdef:/ /home/ec2-user/s3files

After that, you can do familiar things like:

echo "Hello S3 Files" > /home/ec2-user/s3files/hello.txt
ls -al /home/ec2-user/s3files/hello.txt

When you create a file through the mounted file system, the file appears in the S3 bucket as an object and the contents remain the same when retrieved through the S3 API.

That is the key idea in action and even thought you're working through the file interface, S3 still stores the data underneath.

Why this matters for AI

S3 Files is clearly positioned for agents, machine learning and collaborative data-heavy workloads.

The announcement page specifically calls out file-based applications, agents and ML teams as beneficiaries. It states that AI agents can persist memory and share state across pipelines, ML teams can run data preparation workloads without duplicating or staging files first and thousands of compute resources can connect to the same S3 file system simultaneously.

That makes sense.

A lot of modern AI and ML tooling still behaves in very file-centric ways. Training inputs, checkpoints, logs, intermediate artifacts and tool-driven workflows often assume files and directories are available. Even when your long-term storage strategy is object-first, the workloads themselves may still be file-first.

Practical code example

Here's a simple way to show the difference conceptually:

Let's look at working directly with S3 objects in Python.

import boto3
s3 = boto3.client("s3")
bucket = "my-data-bucket"
key = "reports/summary.txt"
response = s3.get_object(Bucket=bucket, Key=key)
content = response["Body"].read().decode("utf-8")
print(content)

That works fine, but your code is explicitly written around the S3 API model.

Now let's review working with a mounted S3 Files path.

file_path = "/mnt/s3files/reports/summary.txt"
with open(file_path, "r") as f:
    content = f.read()
print(content)

The second version feels ordinary because it is ordinary from the application's point of view.

Security and operational details worth noting

S3 Files integrates with IAM for access control, supports identity and resource policies at both the file system and object level, encrypts data in transit with TLS 1.3 and supports encryption at rest with SSE-S3 or customer-managed AWS KMS keys. It can also monitor it using CloudWatch metrics and CloudTrail management events.

So we can treat this as a real production service with access control, encryption and observability expectations teams will need.

Key takeaways for developers

The biggest takeaway is that the boundary between object storage workflows and file-based workflows has got much easier to work across.

That matters because a lot of cloud architecture pain comes from interfaces not matching reality forces teams to build adapters, stage pipelines, duplicate storage systems and synchronize logic just to keep things moving.

S3 Files now gives you the option to say, "maybe I shouldn't have to do that nearly as often."

Lastly, if you're looking to dive deeper in your learning journey, explore these FREE AWS resources:

If you've got this far, thanks for reading! I hope it was worthwhile.

DEV Community