Ryan

Posted on Mar 15 • Edited on Mar 21

3 Ways to ZIP Files Stored on Cloudflare R2

#cloudflare #r2 #webdev #serverless

You have files sitting in Cloudflare R2 and a user just clicked "Download All." Now what?

R2 doesn't have a built-in "zip these objects" operation. You need to figure it out yourself. After building a file processing API that has archived 550K+ files and 10TB+ on R2, here are the three approaches I've found — each with very different trade-offs.

Approach 1: Pull to a Server and Use the `zip` Command

The most straightforward approach. Spin up a container (Fargate, Cloud Run, EC2, etc.), pull the files from R2, and run the good old zip command.

# Pull files from R2 and zip them
aws s3 sync s3://your-r2-bucket/files/ /tmp/files/ \
  --endpoint-url https://<account-id>.r2.cloudflarestorage.com
zip -r /tmp/archive.zip /tmp/files/
# Upload the archive back to R2 or serve it directly

Pros:

Dead simple. zip is battle-tested and handles everything — compression, large files, edge cases
No ZIP implementation needed. You're not writing any ZIP logic yourself
Full control over the server environment, compression level, file structure

Cons:

Disk and memory bound. You need enough disk space to hold all the files + the archive. For large archives (10GB+), this means provisioning beefy instances
Egress costs if your server isn't on Cloudflare. Pulling files from R2 is free (R2 has zero egress fees), but once the ZIP lives on your AWS/GCP server, serving it to the user or uploading it back to R2 means paying your cloud provider's egress fees
Infrastructure overhead. You need to manage containers, queues, autoscaling, and cleanup. It's no longer "just zip these files" — it's a whole pipeline
Not real-time. The user has to wait for the entire download + zip + upload cycle before they can start downloading

Best for: Batch processing, internal tooling, or when you already have server infrastructure and egress costs aren't a concern.

Approach 2: Stream a ZIP in a Cloudflare Worker

Instead of pulling files to a server, you can stream a ZIP archive directly from a Cloudflare Worker. Libraries like JSZip and fflate support streaming, so you can pipe R2 objects through them without buffering entire files.

// Using a streaming ZIP library in a Worker
import { ZipWriter } from 'some-streaming-zip-lib';

export default {
  async fetch(request, env) {
    const keys = ['file1.pdf', 'file2.jpg', 'file3.csv'];

    const { readable, writable } = new TransformStream();
    const zipWriter = new ZipWriter(writable);

    (async () => {
      for (const key of keys) {
        const obj = await env.BUCKET.get(key);
        await zipWriter.addStream(key, obj.body);
      }
      await zipWriter.close();
    })();

    return new Response(readable, {
      headers: { 'Content-Type': 'application/zip' }
    });
  }
};

This works well for simple cases. But things get complicated fast when you need production-level reliability.

Pros:

Constant memory usage. Only one file chunk in memory at a time
Zero egress fees. R2 → Worker → client, all within Cloudflare's network
Streaming. Client starts downloading immediately, no waiting for the full archive
Horizontal scaling. Workers handle many concurrent requests naturally — high throughput isn't a problem

Cons:

Per-archive size is limited. Workers have a 15-minute wall clock limit and a subrequest cap per invocation, so large archives (tens of GB+) won't complete in a single run
Error handling is brutal. If file #500 of 1000 fails mid-stream, you've already sent 499 files to the client. The HTTP response is in-flight — you can't restart or send an error code. The client just gets a truncated ZIP
Checkpoint/resume requires a custom ZIP implementation. To work around the wall clock limit, you'd need to serialize mid-stream state — CRC32 computations, byte offsets, multipart upload progress — and resume exactly where you left off. At that point, off-the-shelf libraries won't cut it, and you're deep in the ZIP spec implementing local file headers, data descriptors, central directory, and ZIP64 extensions yourself

Best for: Small-to-medium archives where the 15-minute wall clock limit isn't a concern. For anything larger, you'll need either serious engineering investment or a different approach.

Approach 3: Use a ZIP API Service

Instead of building and maintaining streaming ZIP infrastructure yourself, use an API that handles it for you. You send a list of R2 URLs (or presigned URLs), and get back a ZIP.

curl -X POST https://api.eazip.io/jobs \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      { "url": "https://your-bucket.r2.dev/file1.pdf" },
      { "url": "https://your-bucket.r2.dev/file2.jpg" },
      { "url": "https://your-bucket.r2.dev/file3.csv" }
    ]
  }'

The service handles streaming, CRC32, ZIP64, error recovery, and checkpoint/resume — all the hard parts — so you don't have to.

Pros:

One API call. No ZIP implementation to build or maintain
Handles edge cases you don't want to think about (ZIP64 for large files, Data Descriptors, checkpoint/resume for failures)
Zero egress if the service also runs on Cloudflare's network
Scales to 5,000+ files per archive, up to 50GB

Cons:

Third-party dependency. You're relying on an external service
Cost. Free tier exists but large-scale usage has costs
Less control over ZIP structure details

Best for: Teams that want ZIP functionality without building ZIP infrastructure. Ship in an afternoon instead of a sprint.

Full disclosure: I built Eazip because I went through Approach 2 myself and realized most teams shouldn't have to.

Comparison

	Server + zip	Stream in Worker	ZIP API
Memory	O(total size)	O(chunk size)	N/A
Egress cost	Depends on server location	$0	$0
Max archive size	Limited by disk	Limited by wall clock (15 min)	50GB
Implementation time	Hours	Hours–Weeks	Minutes
Maintenance	Medium (infra)	High (ZIP spec edge cases)	None
Error recovery	Easy (retry all)	Hard (mid-stream failures)	Built-in

Which Should You Pick?

Have server infrastructure and don't mind egress costs? → Approach 1. Pull the files, run zip, and move on. Just keep in mind that egress fees add up fast at scale — especially if you're on AWS or GCP.

Want to stay on Cloudflare's network? → Approach 2 works great for small-to-medium archives. But once you hit the wall clock limit or need error recovery, complexity escalates quickly.

Want to ship the feature and move on? → Approach 3. One API call, zero infrastructure, zero egress. You can be done in an afternoon.

The reality is that ZIP archiving looks simple until it isn't. What starts as "just zip these files" turns into managing disk space, egress bills, wall clock limits, or mid-stream error recovery — depending on which approach you choose. I learned this the hard way after archiving 10TB+ of files.

What's your approach? Have you tried something different? Let me know in the comments.

DEV Community

3 Ways to ZIP Files Stored on Cloudflare R2

Approach 1: Pull to a Server and Use the `zip` Command

Approach 2: Stream a ZIP in a Cloudflare Worker

Approach 3: Use a ZIP API Service

Comparison

Which Should You Pick?

Top comments (0)

Approach 1: Pull to a Server and Use the zip Command

Approach 2: Stream a ZIP in a Cloudflare Worker

Approach 3: Use a ZIP API Service

Comparison

Which Should You Pick?

Approach 1: Pull to a Server and Use the `zip` Command