Solved: I’ve benchmarked read latency of AWS S3, S3 Express, EBS and Instance store

#devops #programming #tutorial #cloud

🚀 Executive Summary

TL;DR: AWS storage services exhibit vastly different read latencies, with S3 Standard being unsuitable for low-latency, frequent access to small objects, leading to performance bottlenecks. The solution involves selecting purpose-built services like S3 Express One Zone for high-performance object storage, or DynamoDB/ElastiCache for key-value data, to align storage choice with specific application access patterns.

🎯 Key Takeaways

AWS storage services have distinct latency profiles: Instance Store (<1ms), EBS (<1-2ms), S3 Express One Zone (5-10ms), and S3 Standard (50-150ms+).
S3 Standard is optimized for massive scale, durability, and high throughput for large objects, making it inefficient for low-latency, single-digit millisecond access to small, frequently read files.
S3 Express One Zone offers single-digit millisecond object storage latency but trades regional durability for zonal availability, requiring specific ‘directory bucket’ and zonal endpoint configurations.

Choosing the right AWS storage is a critical architectural decision that impacts more than just your monthly bill; it’s about latency. A deep dive into real-world benchmarks for S3, S3 Express, EBS, and Instance Store helps you pick the right tool for the job and avoid costly performance bottlenecks.

S3 is Slow? Yeah, We Need to Talk About AWS Storage Latency.

I remember a frantic Tuesday afternoon. Our primary user authentication service, auth-svc-prod, started throwing timeout errors. Dashboards lit up like a Christmas tree, and the on-call pager was screaming. After an hour of tearing our hair out, tracing requests, and blaming the network team (sorry, guys), we found the culprit. A junior engineer, trying to be clever, had decided to store user session tokens as individual objects in a standard S3 bucket. The idea was sound on paper—durable, scalable storage. In reality, the 50-100ms of latency per read, multiplied by thousands of requests per minute, was enough to bring the entire system to its knees. It’s a classic mistake, one that stems from thinking of all storage as being equal. It’s not.

The “Why”: Not All Terabytes Are Created Equal

The core of the problem is a misunderstanding of what different AWS storage services are built for. Seeing a Reddit thread benchmarking the read latency of these services just brought that painful memory right back. It’s not that S3 is “bad” or “slow”—it’s that it’s designed for a different purpose. It’s object storage, built for massive scale, insane durability, and high throughput over the internet. It’s a miracle of engineering, but it’s not designed for consistent, single-digit millisecond access to small files from a compute instance right next to it.

Every GetObject call to standard S3 is an API call that traverses a complex network path, gets authenticated, and finds your data chunk across multiple availability zones. That takes time. Compare that to the other players:

Storage Type	What It Is	Typical Latency (p99)	Best For
Instance Store (NVMe)	Local disk physically attached to the host. Ephemeral.	< 1ms	Scratch space, super-fast caches, buffers. Data you can afford to lose.
EBS (io2 Block Express)	Network-attached block storage. Persistent.	< 1-2ms	Databases, boot volumes, persistent storage needing low latency.
S3 Express One Zone	High-performance object storage in a single AZ.	5-10ms	AI/ML training, financial modeling, real-time analytics where speed is key.
S3 Standard	Regional, durable object storage.	50-150ms+	Backups, data lakes, static assets, large media files. Things that aren’t on the critical path of a user request.

When you see it laid out like that, using S3 for session tokens is like using a cargo ship to deliver a pizza. It’ll get there, eventually, but your customer will have starved.

Okay, I Messed Up. How Do I Fix It?

If you’re in this situation, don’t panic. You have a few ways out, ranging from a quick patch to a proper architectural fix.

1. The Quick Fix: The “Cache-and-Carry”

This is the band-aid you apply to stop the bleeding during an outage. If your application on an EC2 instance is repeatedly reading the same few objects from S3, just cache them locally. The fastest storage you have is the one you’re already standing on—the instance’s own disk (either its ephemeral instance store or the root EBS volume).

It’s a hacky but effective way to immediately reduce S3 API calls and latency. You can write a simple script that pulls the required assets from S3 on startup or on the first request.

# Simple bash example to run on instance startup
CONFIG_BUCKET="s3://my-app-configs-prod"
CACHE_DIR="/var/cache/app_configs"

echo "Warming up local cache from S3..."
mkdir -p $CACHE_DIR
aws s3 cp $CONFIG_BUCKET/production.json $CACHE_DIR/production.json
aws s3 cp $CONFIG_BUCKET/feature_flags.json $CACHE_DIR/feature_flags.json
echo "Cache warm-up complete."

Warning: This introduces statefulness to your instances. The big gotcha here is cache invalidation. How do you update the file on the instance when the source object in S3 changes? This quick fix can quickly become a complex problem of its own. Use it to get stable, then plan for a real fix.

2. The Permanent Fix: Use the Right Tool for the Job

Once the fire is out, you need to refactor. The real solution is to move the data to a service designed for your access pattern. This is an architectural change.

For key-value data (like our session tokens): The answer is almost always a proper key-value store like DynamoDB or a caching service like ElastiCache (Redis). The latency will drop from ~100ms to <5ms for DynamoDB and <1ms for Redis. This is the correct pattern for this type of workload.
For a shared, persistent filesystem: If you need multiple instances to access the same “files” with low latency, look at EFS (Elastic File System) or attach a high-performance EBS volume to a single instance and serve the data from there.
For a small set of critical assets: Sometimes, it’s fine to just bundle the assets directly into your application’s deployment artifact (e.g., your Docker image). If the assets change infrequently, this is the simplest and fastest option.

3. The “Bleeding Edge” Fix: S3 Express One Zone

AWS saw this problem and created a new storage class to solve it: S3 Express One Zone. This is your “nuclear” option. It’s purpose-built to provide single-digit millisecond latency for object storage, targeting workloads like machine learning data processing and interactive analytics.

Think of it as S3 on steroids, but with some very important trade-offs:

Single AZ: As the name implies, your data lives in one Availability Zone. If that AZ has an issue, your data is unavailable. It’s less durable than standard S3.
Zonal Endpoints: You have to explicitly connect to a zonal endpoint in your code. It’s not a drop-in replacement.
Different Bucket Type: You have to create a new “directory bucket” specifically for this storage class.

This is the right choice when you truly need the S3 API model (e.g., for tools that speak S3 natively) but at latencies that approach EBS. It’s a powerful tool, but you’re trading resilience for speed, and you need to be very intentional about that decision.

Pro Tip: Don’t guess, measure. Before you re-architect anything, use tools like CloudWatch and X-Ray to find your actual performance bottlenecks. The Reddit benchmark is a fantastic guide, but your application’s specific performance profile is the only truth that matters.

At the end of the day, there’s no “best” storage. There is only the right storage for your specific use case. Understanding the latency characteristics is the first—and most important—step to building a system that’s not just scalable, but also fast.