DEV Community

Haripriya Veluchamy
Haripriya Veluchamy

Posted on

Serving ML Artifacts from Amazon S3 Files How I used After the Launch

The Honest Story

Two months ago, everyone was posting about Amazon S3 Files. New feature, big announcement, screenshots everywhere. I scrolled past most of them another AWS launch, another round of "here's what it does" posts.

I never actually understood what it was until it became my problem.

I was in the middle of my ML learning journey, building a semantic search project. Large FAISS indexes, BM25 artifacts all needed to be available at serving time. I was uploading them directly into my deployment. It worked, until it didn't. Hit the size limit. Container cold starts became painful. Every restart meant downloading hundreds of megabytes before the first request could be served.

I had one option left S3.

And somewhere in the back of my head I remembered: "wait, there's that new S3 file system thing everyone was talking about." S3 is already my favorite AWS service. So why not try it?

I tried it. I built something real with it. And now I'm writing the post I wish existed two months ago not "here's what S3 Files is" but "here's what actually happens when you use it."


What I Built

A semantic search engine over AWS documentation 5048 pages, indexed with FAISS + BM25 hybrid retrieval, served via FastAPI. The entire ML artifact stack (indexes, metadata) is served directly from an S3 Files NFS mount on EC2. No cold start downloads. No boto3 in the serving code. Just open().

Stack:

  • Embeddings: all-MiniLM-L6-v2
  • Semantic search: FAISS (IndexFlatIP)
  • Keyword search: BM25Okapi
  • Serving: FastAPI + Jinja2
  • Artifact storage: Amazon S3 + S3 Files (NFS)
  • Compute: EC2 t3.medium (Ubuntu 22.04)

GitHub: Harivelu0/s3-files-ml-serving


The Problem S3 Files Solves

Before S3 Files, serving large ML artifacts from containers looked like this:

# Every container startup
from huggingface_hub import hf_hub_download

faiss_path = hf_hub_download(repo_id="...", filename="faiss.index")  # 7MB
bm25_path  = hf_hub_download(repo_id="...", filename="bm25_index.pkl")  # 17MB
meta_path  = hf_hub_download(repo_id="...", filename="corpus_meta.pkl")  # 2.6MB
Enter fullscreen mode Exit fullscreen mode

Every container restart meant downloading ~27MB before the first request. For larger models this becomes hundreds of MB or GBs. You'd need to manage download logic, handle failures, worry about /tmp size limits, and every container on the same host duplicates the same data.

With S3 Files, your serving code becomes:

import faiss
import pickle

# Just open() no download, no boto3
index = faiss.read_index("/mnt/artifacts/faiss.index")
with open("/mnt/artifacts/bm25_index.pkl", "rb") as f:
    bm25 = pickle.load(f)
Enter fullscreen mode Exit fullscreen mode

The S3 bucket is mounted as an NFS volume on the EC2 host. Your container reads from it like a local file. That's it.


Architecture

AWS Docs (5048 pages)
        ↓
crawl_aws_docs.py (Sitemap + BeautifulSoup)
        ↓
build_index.py (FAISS + BM25 + corpus_meta)
        ↓
S3 Bucket (versioning enabled)
        ↓
S3 File System (NFS layer on top of bucket)
        ↓
Mount Target (NFS endpoint inside VPC)
        ↓
EC2 Ubuntu 22.04
  └── /mnt/artifacts (S3 Files mounted here)
       └── Docker container
            └── FastAPI reads indexes directly
                 └── http://ec2-ip:8000
Enter fullscreen mode Exit fullscreen mode

Setting Up S3 Files What Actually Matters

1. S3 bucket needs versioning enabled

This one catches everyone. S3 Files will refuse to create a file system without it.

aws s3api put-bucket-versioning \
    --bucket your-bucket \
    --versioning-configuration Status=Enabled
Enter fullscreen mode Exit fullscreen mode

2. IAM role trust principal is elasticfilesystem.amazonaws.com

Not s3files.amazonaws.com even though the service is called S3 Files, it's built on EFS under the hood.

{
  "Principal": { "Service": "elasticfilesystem.amazonaws.com" },
  "Condition": {
    "StringEquals": { "aws:SourceAccount": "YOUR_ACCOUNT_ID" }
  }
}
Enter fullscreen mode Exit fullscreen mode

3. The IAM role needs EventBridge permissions

S3 Files uses EventBridge internally to monitor bucket changes. Without these, your file system gets stuck in creating forever.

{
  "Action": [
    "events:PutRule", "events:DeleteRule",
    "events:PutTargets", "events:RemoveTargets"
  ],
  "Resource": "*"
}
Enter fullscreen mode Exit fullscreen mode

4. boto3 API uses camelCase

The S3 Files boto3 client uses camelCase unlike most other AWS services.

# Wrong
s3files.create_file_system(Bucket=bucket_arn, RoleArn=role_arn)

# Correct
s3files.create_file_system(bucket=bucket_arn, roleArn=role_arn)
Enter fullscreen mode Exit fullscreen mode

5. Mount needs amazon-efs-utils, not plain NFS

Plain mount -t nfs4 will fail. S3 Files requires the amazon-efs-utils package (v3.0+) which handles TLS + IAM auth automatically.

# Build and install amazon-efs-utils
sudo apt-get install -y cmake golang-go rustc cargo
git clone https://github.com/aws/efs-utils
cd efs-utils && ./build-deb.sh
sudo apt-get install -y ./build/amazon-efs-utils*.deb

# Then mount
sudo mount -t s3files fs-0xxxxxxxxx:/ /mnt/artifacts
Enter fullscreen mode Exit fullscreen mode

6. EC2 instance needs AmazonS3FilesClientFullAccess

Without this policy on the EC2 role, the mount returns access denied by server.

aws iam attach-role-policy \
    --role-name your-ec2-role \
    --policy-arn arn:aws:iam::aws:policy/AmazonS3FilesClientFullAccess
Enter fullscreen mode Exit fullscreen mode

The Serving Code

FastAPI startup loads everything from the mount. No download logic. No error handling for network failures during download. The mount is always there.

ARTIFACTS_DIR = Path(os.getenv("ARTIFACTS_DIR", "/mnt/artifacts"))

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Reads directly from S3 Files mount
    state["index"] = faiss.read_index(str(ARTIFACTS_DIR / "faiss.index"))

    with open(ARTIFACTS_DIR / "bm25_index.pkl", "rb") as f:
        d = pickle.load(f)
        state["bm25"] = d["bm25"]

    with open(ARTIFACTS_DIR / "corpus_meta.pkl", "rb") as f:
        state["meta"] = pickle.load(f)

    state["model"] = SentenceTransformer("all-MiniLM-L6-v2")
    yield
Enter fullscreen mode Exit fullscreen mode

Container startup: model loads in ~2s. Artifacts available instantly from mount.


Weekly Index Updates The Real S3 Files Advantage

This is where S3 Files goes beyond just solving cold starts.

AWS updates their docs regularly. With traditional artifact serving you'd need to rebuild indexes, push a new container image, and redeploy causing downtime.

With S3 Files:

# update_pipeline.sh runs weekly via EventBridge
python scripts/crawl_aws_docs.py --update   # only changed pages
python precompute/build_index.py            # rebuild indexes to /tmp

# Atomic swap no downtime
mv /tmp/faiss.index.new     /mnt/artifacts/artifacts/faiss.index
mv /tmp/bm25_index.pkl.new  /mnt/artifacts/artifacts/bm25_index.pkl
mv /tmp/corpus_meta.pkl.new /mnt/artifacts/artifacts/corpus_meta.pkl
Enter fullscreen mode Exit fullscreen mode

The serving container picks up the new indexes on the next query. Zero restart. Zero redeploy. The mount sees the updated S3 objects immediately.

This is the killer feature not just cold start elimination, but live index updates without any deployment.


Results

The search UI running at http://98.93.65.241:8000:

  • Query: "aws sns permission issue" → 10 results in 334ms
  • Artifacts loaded at startup: instant (no download)
  • Index update: rebuild + upload to S3 → serving picks up without restart

What I Learned

S3 Files is genuinely useful for ML workloads where:

  • Artifacts are large (>100MB)
  • Multiple containers need the same data
  • Indexes update regularly without downtime
  • You want to avoid baking artifacts into Docker images

It's not a replacement for EFS if you need pure file system performance. And it's not for every use case if your artifacts never change and containers rarely restart, the complexity isn't worth it.

But if you've ever stared at a container downloading 500MB on every cold start, S3 Files is exactly what you were waiting for.


Resources


Top comments (0)