Most archive formats make a simple task unnecessarily expensive: you need one file, so you download and decompress everything.
I built ARCX, a compressed archive format designed to fix that.
ARCX combines cross-file compression (like tar+zstd) with indexed random access (like zip), so you can retrieve a single file from a large archive in milliseconds without decompressing the rest.
Try it
GitHub: https://github.com/getarcx/arcx
Install:
cargo install arcx
Benchmark results
Across 5 real-world datasets:
- ~7ms to retrieve a file from a ~200MB archive
- up to 200x less data read vs tar+zstd
- compression within ~3% of tar+zstd
Example:
| Dataset | ARCX Bytes Read | TAR+ZSTD Bytes Read | Reduction |
|---|---|---|---|
| Python ML | 326 KB | 63.1 MB | 198x less |
| Build Artifacts | 714 KB | 140.4 MB | 202x less |
Why this matters
Modern systems don't need entire archives. They need one file, immediately.
This shows up in:
- CI/CD pipelines (artifacts)
- cloud storage (partial retrieval)
- large codebases
- package registries
ARCX reduces archive access to a manifest lookup, one block read, and one block decompress.
How it works
ARCX uses:
- block-based compression
- a binary manifest index
- direct offset reads
Instead of scanning or decompressing the full archive:
- Look up the file in the index
- Seek to the relevant block
- Decompress only that block
Comparison
| Format | Compression | Selective Access |
|---|---|---|
| ZIP | weaker | fast |
| tar+zstd | strong | slow |
| ARCX | strong | fast |
Tradeoffs
ARCX is not designed for streaming (like tar). The archive must be complete before reading because the manifest is written at the end.
Current limitations
- Remote/S3 range-read workflows not fully benchmarked yet
- Metadata/index overhead still being optimized for very large file counts
- Full extraction benchmarks in Rust are still in progress
Feedback
Still early -- feedback welcome.
Top comments (0)