DEV Community

Ryan
Ryan

Posted on

3 Ways to ZIP Files Stored on Cloudflare R2

You have files sitting in Cloudflare R2 and a user just clicked "Download All." Now what?

R2 doesn't have a built-in "zip these objects" operation. You need to figure it out yourself. After building a file processing API that has archived 550K+ files and 10TB+ on R2, here are the three approaches I've found — each with very different trade-offs.


Approach 1: Download Locally, ZIP, Re-upload

The most obvious approach. Pull files down, zip them on your server, upload the archive back to R2 (or serve it directly).

// Pseudocode
const files = await Promise.all(
  keys.map(key => r2.get(key))
);
const zip = new JSZip();
files.forEach(f => zip.file(f.name, f.body));
const archive = await zip.generateAsync({ type: 'nodebuffer' });
// Serve or upload archive
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Simple to implement
  • Works with any ZIP library (JSZip, archiver, etc.)
  • Full control over the ZIP structure

Cons:

  • Memory killer. You're buffering everything. 100 files × 50MB = 5GB in RAM
  • Slow. Download all → process → upload/serve is sequential
  • Egress costs if your server isn't on Cloudflare. Pulling files from R2 is free (R2 has zero egress fees), but once the ZIP lives on your AWS/GCP server, serving it to the user or uploading it back to R2 means paying your cloud provider's egress fees
  • Scaling requires extra infrastructure. You could run this on Fargate or similar, but you'd need to build the queue, orchestration, and autoscaling yourself — it's no longer "just zip these files"

Best for: Small archives (< 100MB total), infrequent use, prototyping.


Approach 2: Stream a ZIP in a Cloudflare Worker

Instead of buffering, you can construct a ZIP archive as a stream directly inside a Cloudflare Worker. The Worker fetches each file from R2 and pipes it into a ZIP stream that goes straight to the client.

// Simplified concept
export default {
  async fetch(request, env) {
    const keys = ['file1.pdf', 'file2.jpg', 'file3.csv'];

    const { readable, writable } = new TransformStream();
    const writer = writable.getWriter();

    // Write ZIP local file headers + data for each file
    for (const key of keys) {
      const obj = await env.BUCKET.get(key);
      await writeLocalFileHeader(writer, key); // sizes unknown at this point

      // Stream file data through, computing CRC32 on the fly
      const reader = obj.body.getReader();
      let crc = 0;
      while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        crc = crc32(crc, value);
        await writer.write(value);
      }

      await writeDataDescriptor(writer, crc, obj.size);
    }

    // Write central directory at the end
    await writeCentralDirectory(writer, entries);
    await writer.close();

    return new Response(readable, {
      headers: { 'Content-Type': 'application/zip' }
    });
  }
};
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Constant memory usage. Only one file chunk in memory at a time
  • Zero egress fees. R2 → Worker → client, all within Cloudflare's network
  • Streaming. Client starts downloading immediately, no waiting for the full archive
  • Horizontal scaling. Workers handle many concurrent requests naturally — high throughput isn't a problem

Cons:

  • You have to implement ZIP yourself. Local file headers, data descriptors, CRC32, central directory, ZIP64 extensions — it's a lot of spec to get right
  • 128MB Worker memory limit. You can't cheat with buffering even if you wanted to
  • Per-archive size is limited. Workers have a 15-minute wall clock limit and a subrequest cap per invocation, so large archives (tens of GB+) won't complete in a single run. Working around this requires a significantly complex checkpoint/resume system — serializing CRC32 mid-computation, byte offsets, multipart upload state, and resuming exactly where you left off
  • Error handling is brutal. If file #500 of 1000 fails mid-stream, you've already sent 499 files to the client. The HTTP response is in-flight — you can't restart or send an error code. The client just gets a truncated ZIP
  • CRC32 must be computed on-the-fly since you can't seek back to update headers (Data Descriptors solve this, but not all ZIP readers support them well)

Best for: Production systems that need to handle large archives at scale, if you're willing to invest in the implementation.


Approach 3: Use a ZIP API Service

Instead of building and maintaining streaming ZIP infrastructure yourself, use an API that handles it for you. You send a list of R2 URLs (or presigned URLs), and get back a ZIP.

curl -X POST https://api.eazip.io/jobs \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      { "url": "https://your-bucket.r2.dev/file1.pdf" },
      { "url": "https://your-bucket.r2.dev/file2.jpg" },
      { "url": "https://your-bucket.r2.dev/file3.csv" }
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

The service handles streaming, CRC32, ZIP64, error recovery — everything from Approach 2 — so you don't have to.

Pros:

  • One API call. No ZIP implementation to build or maintain
  • Handles edge cases you don't want to think about (ZIP64 for large files, Data Descriptors, checkpoint/resume for failures)
  • Zero egress if the service also runs on Cloudflare's network
  • Scales to 5,000+ files per archive, up to 50GB

Cons:

  • Third-party dependency. You're relying on an external service
  • Cost. Free tier exists but large-scale usage has costs
  • Less control over ZIP structure details

Best for: Teams that want ZIP functionality without building ZIP infrastructure. Ship in an afternoon instead of a sprint.

Full disclosure: I built Eazip because I went through Approach 2 myself and realized most teams shouldn't have to.


Comparison

Download + ZIP Stream in Worker ZIP API
Memory O(total size) O(chunk size) N/A
Egress cost Depends on server location $0 $0
Max archive size Limited by server RAM/disk Limited by wall clock (15 min) 50GB
Implementation time Hours Weeks Minutes
Maintenance Low High (ZIP spec edge cases) None
Error recovery Easy (retry all) Hard (mid-stream failures) Built-in

Which Should You Pick?

Prototyping or small files? → Approach 1. Just use JSZip and move on.

Production with scale requirements? → Approach 2 if you want full control and have the engineering bandwidth, or Approach 3 if you'd rather ship the feature and focus on your core product.

The reality is that ZIP is a deceptively complex format. What starts as "just zip these files" turns into weeks of handling CRC32 streaming, ZIP64 thresholds, Data Descriptor compatibility, and mid-stream error recovery. I learned this the hard way after archiving 10TB+ of files.


What's your approach? Have you tried something different? Let me know in the comments.

Top comments (0)