How I Stumbled Into This Rabbit Hole
A few days ago, saw on LinkedIn someone dropped a post - "S4 - Store Less Pay Less" and since then there have been some people (re-)posting their views on how awesome it is.
I clicked. I read. I raised an eyebrow.
Don't get me wrong — the engineering is impressive. Rust codebase, NVIDIA nvCOMP integration, intelligent codec dispatching, S3 wire compatibility, the whole nine yards. The README is polished, the benchmarks are thorough, and the alpha disclaimer is refreshingly honest. It's a genuinely cool piece of open-source work.
But here's the thing that kept nagging at me: we already solved this problem. And we solved it without adding a proxy layer, without managing GPU instances, without dealing with sidecar index files, and without the operational complexity of running yet another piece of infrastructure in the hot path of every S3 request.
That solution?
Amazon FSx for NetApp ONTAP. And specifically, its combination of inline deduplication, compression, automatic tiering to low-cost capacity pool storage, and — the kicker — native S3 Access Points that let you read your file data through S3 APIs without copying a single byte.
So I decided to write this down. Not to look down a cool open-source project, but to make sure people know there's an alternative that might save them a lot more than just storage bytes.
What S4 Actually Does
S4 (Squished S3) is an S3-compatible storage gateway written in Rust. You point your S3 clients at it instead of AWS S3, and it transparently compresses every object before writing it to the backend bucket. On read, it decompresses. The app doesn't know anything changed — same SDK calls, same SigV4 auth, same everything.
It supports multiple codecs:
- CPU zstd (levels 1–22) for text and logs
- GPU nvCOMP (zstd, Bitcomp, GDeflate) for integer/columnar data like Parquet
- Passthrough for already-compressed files
The compression ratios are legitimately impressive. Nginx logs at 155×. Parquet-like data at 2×. Sorted integer columns at 11.9×. For the right workloads, the storage savings are real.
But here's the architecture:
Your App → S4 Gateway (EC2 + GPU) → AWS S3 (real bucket)
S4 sits in the middle. Every PUT goes through compression. Every GET goes through decompression. Every request pays the latency tax of an extra network hop. And if the gateway goes down, your S3 access goes down with it — unless you've architected around it, which is yet more complexity.
The project is currently alpha / early-access with no public production deployments yet. The authors are upfront about this: "We do not yet recommend S4 as the only copy of irreplaceable data."
What FSx for ONTAP Already Does
Now let's look at the other side of the coin. Amazon FSx for NetApp ONTAP is a fully managed NetApp file system in AWS. It's been around for a while, and it's built on decades of NetApp storage engineering.
Here's what you get out of the box:
Inline Deduplication, Compression, and Compaction
ONTAP does this at the storage layer — not at a proxy layer, not as an afterthought. As data is written, it gets deduplicated (duplicate blocks are stored once), compressed (using algorithms tuned for the workload), and compacted (small files packed together efficiently). NetApp claims up to 90% storage efficiency savings in typical environments, and real-world customers see 50–65% regularly.
This happens inline — meaning the efficiency is applied before data hits the disk. No extra hops. No gateway. No sidecar files.-
Automatic Data Tiering to Capacity Pool
Here's where it gets really interesting for cost optimization. FSx for ONTAP automatically moves infrequently accessed data from SSD storage to a capacity pool tier. The capacity pool is priced at roughly 0.0219 per GB-month — compared to S3 Standard at 0.023 per GB-month, but with the massive difference that your data is still accessible via NFS, SMB, or S3 APIs without any application changes.Wait, S3 APIs? Yes. We'll get to that.
S3 Access Points — The Game Changer
In late 2025, AWS launched S3 Access Points for FSx for ONTAP. This means you can attach an S3 Access Point to any FSx for ONTAP volume, and suddenly your file data is accessible via the S3 API. Bedrock can read it. Athena can query it. SageMaker can train on it. Glue can process it.
And here's the critical part: the data stays in the file system. There's no copy operation. No ETL pipeline. No data movement. No proxy. One copy of the data, accessible via three protocols (NFS, SMB, S3) simultaneously. If a file changes, all access methods see the change immediately.Fully Managed, No Operational Overhead
FSx for ONTAP is an AWS managed service. You don't patch it. You don't scale it. You don't worry about GPU drivers or CUDA versions or nvCOMP SDK compatibility. You provision it, set your policies, and it runs. Backups, snapshots, replication, cloning — all built-in.
The Head-to-Head: Why S4 Feels Like Reinventing the Wheel
Let's walk through the problems S4 tries to solve and see how FSx for ONTAP addresses each one.
Problem 1: "My S3 bill is too high because of storage volume"
S4's approach: Compress objects before writing to S3. Pay for compressed bytes. Need a GPU instance to make it worthwhile at scale.
FSx for ONTAP's approach: Deduplicate and compress inline. Tier cold data to capacity pool at 0.0438/GB-month. No additional infrastructure. No compute cost to "pay for itself."
Let's do some math. S4's own cost analysis says you need roughly a 3,000/month S3 bill before the GPU instance cost (730/month for a g6.xlarge) starts making sense. Below that, you're losing money.
With FSx for ONTAP, if 80% of your 100TB dataset is cold, you tier it to capacity pool. At 65% storage efficiency, your 100TB becomes 35TB of actual stored data. The cost lands around 0.059 per GB-month effective — and that's including SSD hot tier, capacity pool, throughput, and backups. No extra compute. No gateway to manage.
Problem 2: "I need to access my data via S3 APIs for analytics"
S4's approach: Your data is still in S3, just compressed. S3 APIs work natively.
FSx for ONTAP's approach: S3 Access Points expose your file data as S3 objects. AWS services read it directly. No proxy. No compression/decompression overhead on every request. The data is already there, already efficient, already accessible.
Problem 3: "I have mixed workloads — logs, Parquet, images"
S4's approach: Intelligent codec dispatching. GPU for columnar, CPU for text, passthrough for already-compressed.
FSx for ONTAP's approach: ONTAP's storage efficiency engines handle this automatically. Deduplication catches redundant blocks across all file types. Compression algorithms adapt to the data. And for truly incompressible data (images, videos), it just stores it efficiently without wasting CPU cycles trying to compress the uncompressible.
Problem 4: "I need range reads for Parquet/ORC analytics"
S4's approach: S4IX sidecar files map compressed byte ranges to original byte ranges. Works, but adds complexity — every object has a sidecar, and if they get out of sync, range reads degrade.
FSx for ONTAP's approach: Native file system block mapping. Random reads work at the file system level. Parquet readers issue their byte-range requests against the file system, which serves them directly from the appropriate storage tier. No sidecars. No index files to maintain.
Problem 5: "I need durability and don't want to lose data"
S4's approach: Alpha software. Authors explicitly say "do not use as the only copy of irreplaceable data." You need backend versioning + replication as safety net.
FSx for ONTAP's approach: Enterprise-grade NetApp ONTAP. RAID-DP, snapshots, cross-region replication, backup integration, compliance features. Battle-tested in production for decades.
But Wait — When Does S4 Actually Make Sense?
I promised an unbiased view, so let's be fair. S4 isn't useless. There are scenarios where it's the right tool:
You're locked into raw S3 and can't migrate
If your organization has a hard requirement to use S3 directly — perhaps for compliance, existing architecture, or political reasons — and you genuinely cannot move to a managed file system, S4 is a clever way to reduce storage costs without changing application code.You want open-source, portable compression
S4's format is open (Apache-2.0), and you can decompress with standalone tools. If vendor lock-in is your primary concern and you want to own the compression format, that's a valid architectural choice.Leveraging cheaper S3 Storage Classes for old data
Double savings - compreseion savings from S4 and S3 intelligent tiering to move the already compressed data to lower cost storage classes.
The Real Cost Nobody Talks About
Here's my biggest gripe with the S4 approach, and it's not about the technology — it's about total cost of ownership that goes beyond the AWS bill.
Operational complexity: S4 is a proxy. Proxies fail. Proxies need monitoring. Proxies need scaling. Proxies add latency. You now have a critical piece of infrastructure in the data path that isn't managed by AWS. When it breaks at 2 AM, your on-call engineer is debugging Rust logs and GPU memory allocation.
The sidecar problem: Every object has a .s4index sidecar. Lifecycle policies must move both files together. If they drift — one goes to Glacier, the other stays in Standard — range reads break. S4 has a repair-sidecar tool, which means this is a known operational headache.
The latency tax: Every GET pays decompression overhead. Every PUT pays compression overhead. For analytics workloads that do thousands of small range reads, this adds up. S4's own docs admit: "not fine for an OLTP-style hot read path."
The GPU dependency: To get the best ratios, you need NVIDIA GPUs with nvCOMP. That means specific EC2 instance types, CUDA drivers, and SDK compatibility. It's not "apt-get install and forget."
Compare this to FSx for ONTAP: you create the file system, set your tiering policy, attach an S3 Access Point, and you're done. The efficiency happens transparently. The tiering happens automatically. The S3 access happens natively. No proxies. No sidecars. No GPU drivers.
My Take: Look Under the Hood Before You Build the Engine
The S4 project is a fantastic engineering exercise. The benchmarks are real. The code is clean. The team is transparent about limitations. If I were building a pure S3 compression gateway from scratch, this is probably what I'd want it to look like.
But here's the thing — most people don't need to build a compression gateway. They need cheaper storage, efficient data access, and S3 compatibility for their analytics pipelines. And that problem is already solved by a managed service that does it better, more reliably, and with less operational overhead.
FSx for ONTAP isn't perfect. It has its own pricing model to understand (SSD + capacity pool + throughput + IOPS). It's not as "cool" as a Rust GPU gateway. But it's production-ready today, it's fully managed by AWS, and it solves the exact same problem S4 is trying to solve — plus a dozen more (snapshots, cloning, replication, multi-protocol access) that S4 doesn't even attempt.
The S4 README has a section titled "When NOT to use S4" that's refreshingly honest. It lists small objects, metadata-heavy workloads, ultra-low-latency requirements, and regulatory environments. I'd add one more: "When Amazon FSx for NetApp ONTAP with S3 Access Points meets your requirements."
Final Thoughts
If you're a startup with a 50,000/month S3 bill, a tolerance for alpha software, and an engineer who loves Rust and CUDA — S4 might be worth a pilot. The math works at that scale, and the open-source nature gives you control.
But if you're an enterprise with mixed workloads, compliance requirements, a need for file and object access to the same data, and a preference for managed services over self-hosted proxies — FSx for ONTAP is almost certainly the better path. You get the storage efficiency, the tiering, the S3 compatibility, and the enterprise data services, all without the operational burden of running a gateway.
Sometimes the best engineering decision isn't to build the coolest solution. It's to recognize that someone already built a better one, and use it.
References
abyo-software
/
s4
GPU-accelerated transparent compression S3-compatible storage gateway. Drop-in replacement for AWS S3 endpoints; cuts your S3 bill 50-80% with no app changes (Rust, nvCOMP, zstd).
S4 — Squished S3
Drop-in S3-compatible storage gateway with GPU-accelerated transparent compression. Reduces S3 storage bytes 50–80% for compressible payloads (logs, JSON Parquet/ORC) without changing application code. Total bill impact depends on workload mix — request cost / egress / GPU compute are unchanged.
Headline numbers (RTX 4070 Ti SUPER + Ryzen 9 9950X, single-pass roundtrip
through s4-codec, last benchmarked 2026-05-13 on nvCOMP 5.2.0.10 / CUDA
13.2 driver 595.58.03; full table + reproduction recipe below):
| Workload | Best ratio | Best compress throughput | Codec verdict |
|---|---|---|---|
| nginx access log (256 MiB) | 155× (cpu-zstd-3) | 3.7 GB/s (cpu-zstd-3) | CPU wins — text deduplicates well at low CPU cost |
| Parquet-like mixed (256 MiB) | 2.09× (nvcomp-bitcomp) | 1.5 GB/s (nvcomp-bitcomp) | GPU wins on Bitcomp for integer/columnar layouts |
| Postings (u32, 64 MiB) | 11.9× (nvcomp-bitcomp) | 1.6 GB/s (nvcomp-bitcomp) | GPU wins decisively on monotonic integer columns |
| Already-compressed (64 MiB) | 1.00× (passthrough) | 2.2 GB/s (passthrough) | Dispatcher detects |
Disclosure: I work for NetApp as a Solutions Architect (for) Amazon FSx for NetApp ONTAP. My views are my own, and I've tried to be fair to both approaches. S4 is cool. FSx for ONTAP is practical. Your mileage may vary.
Top comments (0)