Pendela BhargavaSai

Posted on Apr 12

"Why can’t I just mount S3 like a drive?” AWS finally answering that question in 2026

#aws #devops #machinelearning #architecture

From "why can't I just mount S3 like a drive?" to AWS finally answering that question in 2026.

I've had that conversation more times than I can count.

A developer joins a new AWS project, looks at the architecture, and asks: "We're already storing everything in S3 — why do we also need EFS? Can't we just mount S3 directly?"

And every time, the answer was the same patient explanation about object storage vs file systems, why they're fundamentally different, and why you need separate services for separate workloads. It was the right answer. It just wasn't a satisfying one.

That changed in April 2026 when AWS launched S3 Files — and suddenly that conversation got a lot shorter.

But before we get there, let's start from the beginning. Because understanding why S3 Files matters requires understanding the problem it's solving. And that means understanding the full AWS storage landscape.

The AWS Storage Trinity (Before S3 Files)

AWS has three primary storage services, each built for a completely different purpose. Engineers often get confused because on the surface they all seem to do the same thing: store data. But the way they store it — and who can access it and how — is completely different.

Here's the simplest way I know to think about it:

S3 is like a giant library. You can store billions of books (objects), and anyone with the right access can retrieve any book. But to fix a typo on page 47, you have to reprint the entire book.
EBS is like a hard drive physically attached to your computer. Super fast, but only your computer can use it.
EFS is like a shared office filing cabinet on a network. Anyone in the office can open a drawer, pull out a folder, and edit a document — at the same time.

Let's go deeper on each one.

Amazon S3 — Object Storage Built for Scale

S3 (Simple Storage Service) launched in 2006 and fundamentally changed how the world thinks about storing data. The core idea is simple: you have buckets, and inside buckets you store objects. Each object is just a file plus its metadata, stored at a unique key (think of it like a URL).

What makes S3 special

Virtually unlimited scale. S3 stores more than 500 trillion objects across hundreds of exabytes today.
11 nines of durability (99.999999999%). AWS automatically replicates your data across at least three Availability Zones.
Pay only for what you use. No minimum capacity, no infrastructure to manage.
Multiple storage classes. From S3 Standard (~$0.023/GB) down to Glacier Deep Archive (~$0.00099/GB) for data you almost never touch.

The one thing S3 cannot do

Here's the catch that trips everyone up: S3 is not a file system.

When you store something in S3, it becomes an immutable object. If you want to change even a single character in a file, you have to download the entire object, make your change, and re-upload the whole thing as a new object. There's no such thing as "open this file and edit line 47." That's just not how object storage works.

This isn't a bug — it's by design. The immutability of objects is part of what makes S3 so durable and scalable. But it creates real friction for any workload that needs to work with data the way normal applications do: open a file, read some bytes, write some bytes, save.

# What you can do with S3
aws s3 cp myfile.txt s3://my-bucket/myfile.txt    # upload
aws s3 cp s3://my-bucket/myfile.txt ./myfile.txt  # download
aws s3 rm s3://my-bucket/myfile.txt               # delete

# What you CANNOT do
# Open myfile.txt and append a line — impossible without full re-upload

Amazon EBS — The Fast Attached Drive

EBS (Elastic Block Store) is block storage — the AWS equivalent of an SSD attached directly to your server. When you launch an EC2 instance, the root volume (where the operating system lives) is an EBS volume.

What EBS is good at

Speed. EBS delivers single-digit millisecond latency because it behaves like a local disk.
POSIX semantics. You can open files, write individual bytes, seek to specific positions — everything a normal file system supports.
Consistency. What you write is immediately readable. No eventual consistency concerns.

The hard limit of EBS

EBS volumes can only be attached to one EC2 instance at a time (with some multi-attach exceptions for specific use cases).

This means if you have a cluster of 10 EC2 instances all running your application, each one needs its own EBS volume. They can't share data through EBS. If instance A writes a file, instance B can't see it without some kind of sync mechanism.

EC2 Instance A  →  EBS Volume A  (can't share)
EC2 Instance B  →  EBS Volume B  (separate, isolated)
EC2 Instance C  →  EBS Volume C  (separate, isolated)

For single-instance workloads — databases, operating system volumes, single-server applications — EBS is excellent. The moment you need shared storage across multiple servers, you hit a wall.

Amazon EFS — The Shared Network Drive

EFS (Elastic File System) is AWS's managed Network File System (NFS). Think of it as a shared drive that any number of EC2 instances, containers, or Lambda functions can mount simultaneously and use like a local file system.

What EFS solves

Concurrent access. Thousands of compute resources can mount and use the same EFS volume at the same time.
Full POSIX semantics. Open files, edit bytes in-place, file locking, directory operations — everything works.
Scales automatically. The file system grows and shrinks as you add or remove files. No capacity planning required.
Sub-millisecond latency on Standard tier.

EC2 Instance A  ──┐
EC2 Instance B  ──┤──→  EFS Volume  (all share the same files)
EC2 Instance C  ──┘
Lambda Function ──┘

Where EFS falls short

The pricing model. EFS charges you for every gigabyte stored, whether you touched it this month or not. Standard tier is $0.30/GB-month — roughly 13x more expensive than S3 Standard per gigabyte.

This is fine when your data is "hot" (actively accessed). It's painful when you have petabytes of data where only a fraction is actively used at any time. You end up paying full file system prices for data that's sitting idle.

And the other problem: EFS has zero native integration with S3. They're completely separate systems. Your data lake is in S3. Your compute needs EFS. So you write sync scripts to copy data back and forth — and now you have two copies of everything, two storage bills, and a manual process that breaks at the worst possible times.

The Old Workflow Pain (The Problem All of This Creates)

Before S3 Files, a typical ML or data engineering team's workflow looked like this:

S3 Data Lake
    ↓  (manual copy — takes time, costs money)
EFS Volume
    ↓  (mount on EC2)
EC2 Training Job
    ↓  (output back to EFS)
    ↓  (another manual copy)
S3 Data Lake  ← results stored here for analytics

Every arrow in that diagram is a point of failure. Every copy step is a delay, a cost, and a potential for the two copies to drift out of sync. Engineers were spending real engineering hours maintaining these sync pipelines — hours that weren't building anything valuable.

This is the problem that s3fs tried to solve, years before AWS had an official answer.

s3fs-fuse — The Community's Workaround

If you've been working with AWS for a few years, you've probably encountered s3fs-fuse. It's an open-source FUSE (Filesystem in Userspace) tool that lets you mount an S3 bucket as a local directory on Linux, macOS, or FreeBSD.

# Install
sudo apt-get install s3fs

# Configure credentials
echo "ACCESS_KEY_ID:SECRET_ACCESS_KEY" > ~/.passwd-s3fs
chmod 600 ~/.passwd-s3fs

# Mount your bucket
s3fs my-bucket /mnt/s3-data -o passwd_file=~/.passwd-s3fs

After that, you can run ls, cp, cat — your S3 bucket looks like a local folder. For a quick demo or a simple use case, it feels magical.

What's actually happening under the hood

Here's the thing nobody tells you upfront: s3fs isn't really giving you file system access to S3. It's translating file commands into S3 API calls — and the translation has serious limitations.

When you "edit" a file through s3fs, this is what actually happens:

You: nano myfile.txt  (make a small change, save)
     ↓
s3fs: GET entire object from S3 → download to local temp cache
s3fs: You edit the local temp copy
s3fs: On file close → PUT entire object back to S3 (full re-upload)

Change one character in a 10GB file? s3fs downloads all 10GB, makes the change, and uploads all 10GB again. Every time.

The real limitations you need to know

No file locking. If two processes try to write to the same file through s3fs at the same time, you get data corruption. Not an error message — silent data corruption.

No atomic renames. Renaming a file in s3fs copies it to a new key and deletes the old one. Any application that relies on atomic renames (which includes most databases and many log processors) will break.

Slow directory listings. Every ls is a ListObjects API call to S3. On a bucket with millions of objects, this is painfully slow.

No hard links or symbolic links. S3 simply doesn't support them.

Operation          | What s3fs does              | Problem
-------------------|-----------------------------|-----------------------
Read file          | GET entire object           | Slow for large files
Edit file          | Download → edit → full PUT  | Expensive re-upload
Append to file     | Rewrite entire object       | Very expensive
Rename file        | Copy + Delete               | Not atomic
File lock          | Not supported               | Data corruption risk
List directory     | ListObjects API call        | Slow on large buckets

s3fs works well for lightweight, read-heavy, single-process use cases. But the moment you need multi-process access, in-place edits, or production reliability — it starts breaking down. The community built it because AWS didn't have a better answer. Eventually, AWS tried building their own version.

Mountpoint for S3 — AWS's Open-Source Attempt (2023)

In 2023, AWS released Mountpoint for S3, their own open-source FUSE client. It was faster than s3fs-fuse and better optimised for cloud-native read-heavy workloads.

But it still couldn't do in-place edits, directory renames, or file locking. It was better than s3fs-fuse, but it still hit the same fundamental ceiling: you can't make S3's API behave like a real file system by pretending.

AWS knew this. Internally, they'd been trying to solve it properly for years.

Amazon S3 Files — The Real Solution (April 2026)

On April 7, 2026, AWS launched S3 Files — and it's the most significant S3 update since the service launched.

The internal project was even called "EFS3" at one point. One engineer on the team described the design process as "a battle of unpalatable compromises." Getting object storage and file system semantics to truly coexist is genuinely hard engineering. Every design decision forced a tradeoff where either the file presentation or the object presentation had to give something up.

What they landed on is clever: instead of trying to make the S3 API behave like a file system (which is what s3fs does), they did the opposite — they took a real, production-grade file system (EFS) and connected it directly to S3 storage.

How S3 Files actually works

S3 Files uses a two-tier architecture:

Tier 1 — EFS Cache Layer (hot data)

Stores your active working set: recently written files, recently read files, metadata
Delivers ~1ms latency
Serves small files (under 128KB by default) entirely from cache
Handles all NFS file operations — open, read, write, rename, lock

Tier 2 — S3 Bucket (your full dataset)

Holds your complete data at normal S3 prices (~$0.023/GB)
Large reads (1MB+) bypass the cache entirely and stream directly from S3 for free
Changes made through the file system sync back to S3 automatically within minutes

Your Application
      ↓  (NFS mount — standard Linux file operations)
EFS Cache Layer  ←→  Smart Router
      ↓                    ↓
   Hot data            Cold/large data
   (~1ms)              (streams from S3, free)
      ↓                    ↓
      └────────────────────┘
                  ↓
            S3 Bucket
       (your data, always here)

The key insight: your data never leaves S3. The EFS cache is just a smart caching layer on top. You're not maintaining two copies — you have one copy in S3, accessible via both the S3 API and the file system mount simultaneously.

OLD way to New way

Getting started in 3 steps

Step 1: Create an S3 file system

In the AWS Console → S3 → File Systems → Create file system. Enter your bucket name, done.

Or via CLI:

aws s3api create-file-system --bucket my-bucket
aws s3api create-mount-target --file-system-id fs-xxxx --subnet-id subnet-xxxx

Step 2: Mount it on your EC2 instance

Make sure the amazon-efs-utils package is installed (preinstalled on AWS AMIs), then:

sudo mkdir /mnt/s3files
sudo mount -t s3files fs-0aa860d05df9afdfe:/ /mnt/s3files

Step 3: Use it like any local directory

# Create a file
echo "Hello S3 Files" > /mnt/s3files/hello.txt

# Edit it in place
echo "New line added" >> /mnt/s3files/hello.txt

# List files
ls -la /mnt/s3files/

# The same data is accessible via S3 API too
aws s3 ls s3://my-bucket/

Changes you make through the file system mount appear in S3 within minutes. Changes made directly to the S3 bucket appear in the file system within seconds.

Security — what you need to know

IAM integration for access control at both file system and object level
Data encrypted in transit using TLS 1.3
Data encrypted at rest using SSE-S3 (or KMS if you prefer customer-managed keys)
POSIX permissions (UID/GID) stored as S3 object metadata
Monitor via CloudWatch metrics and CloudTrail logs

Pricing — the part that actually makes sense

S3 Files charges EFS-level rates, but only on the fraction of data you're actively working with:

What you pay for	Rate
High-performance storage (hot data)	$0.30/GB-month
Reads (small files served from cache)	$0.03/GB
Writes	$0.06/GB
Everything else in your S3 bucket	Standard S3 rates (~$0.023/GB)

If you have a 100TB dataset but only 1TB is actively used at any time — you pay EFS rates on 1TB and S3 rates on the other 99TB. AWS claims up to 90% cost savings compared to the old pattern of cycling data between S3 and a dedicated EFS volume.

Putting It All Together — Which Service Should You Use?

Here's the honest answer:

Use this	When you need
S3	Bulk storage, backups, data lakes, analytics, static assets, anything accessed via API
EBS	OS volumes, databases, single-instance high-performance storage
EFS	Shared file system for legacy NAS migration, on-premises workloads moving to cloud, apps that need pure NFS without S3
S3 Files	ML pipelines, agentic AI workflows, data engineering, any workload where both S3 API and file system access are needed
s3fs-fuse	Quick prototypes, read-heavy single-process scripts, legacy apps where you can't change the architecture

The quick comparison

Why This Matters for ML and AI Workloads

If you're building machine learning pipelines or agentic AI systems, S3 Files is worth paying close attention to.

The old workflow was: data lives in S3 → copy to EFS before training → run training job → copy results back to S3. For large datasets, that copy step alone could take hours. You were also paying double storage costs during the transition.

With S3 Files, your training job mounts the S3 bucket directly. The EFS cache warms up as your training reads data. No copy step. No sync script. No duplicate storage.

For agentic AI systems specifically — where multiple agents need to coordinate through shared files, read from each other's outputs, maintain shared state — S3 Files provides exactly the concurrent NFS access with close-to-open consistency that these workloads need. Standard Python file operations, standard shell tools, all working against data that lives in S3.

The Short Version

For a decade, AWS storage was a choice: pay S3 prices and lose file system semantics, or pay EFS prices and lose S3 integration. Teams wrote sync scripts, maintained duplicate data, and spent engineering time on storage plumbing instead of actual product work.

s3fs-fuse was the community's best attempt at a workaround — and it worked, up to a point. But it was always emulating file system behavior on top of an API that wasn't designed for it.

S3 Files is the first time AWS has genuinely solved this at the right layer. Real NFS semantics, real S3 storage, real production reliability. One bucket, two protocols, no compromises.

If you've ever maintained a sync script between your data lake and your compute layer — you know exactly what problem this solves. And you know exactly how good it feels to delete that script.

Resources

Published April 2026. All pricing figures reflect us-east-1 as of the time of writing.

If this helped you, drop a reaction or leave a comment — curious what storage patterns others are running into in the wild.

DEV Community