DEV Community

Tatsuya Nishimura
Tatsuya Nishimura

Posted on

Save on DuckDB + S3 Transfer Costs

TL;DR

Use Cloudflare R2, or run DuckDB on EC2 in the same region as your S3 bucket with Gateway Endpoint enabled.

A quick note

Stick with Parquet.

How much data actually gets transferred?

When you query a Parquet file on S3 through DuckDB, the whole file doesn't get downloaded. Instead, DuckDB uses HTTP Range Requests to grab only the bytes it needs.

The mechanics

DuckDB fetches data in two passes:

  1. Metadata: Range-request just the metadata section of the Parquet file
  2. Data: Range-request only the columns and row groups needed by your query

"DuckDB always uses range requests, firstly to query the metadata only, then to fetch the required columns."
PR #5405: HTTP parquet optimizations

A concrete example

SELECT column_a FROM 's3://bucket/file.parquet';
Enter fullscreen mode Exit fullscreen mode

DuckDB downloads only the bytes containing column_a. So even with a 10GB file, if column_a is just 100MB, you only transfer ~100MB.

Even better—sometimes you don't transfer anything:

SELECT count(*) FROM 's3://bucket/file.parquet';
Enter fullscreen mode Exit fullscreen mode

Parquet metadata includes row counts, so DuckDB can return your result without reading any data at all.

Reference: DuckDB Official Documentation - HTTP(S) Support

Filter and projection pushdown

DuckDB's S3 reader can push filters and projections down to the storage layer, so even less data gets touched.

"We're able to do partial reads via Range requests actually, so it should be fairly efficient."
Discussion #4559

So why do S3 bills get so nasty?

Here's the catch: intra-region EC2-to-S3 transfers are free.

Yet somehow people end up with shocking bills. What's going on?

1. No S3 Gateway Endpoint

Without a Gateway Endpoint, traffic from your VPC to S3 gets routed through NAT Gateway or the internet gateway.

  • Via NAT Gateway: You pay $0.045/GB
  • Via Gateway Endpoint: Free

"There is no additional charge for using gateway endpoints."
AWS Official Documentation - Gateway endpoints for Amazon S3

2. Accessing across regions

If your S3 bucket and EC2 are in different regions, AWS charges you $0.01–$0.02/GB for the privilege.

3. Going out to the internet

Querying from your laptop or anything outside AWS? You pay $0.09/GB and up for internet egress.

The fix

EC2 + S3 Gateway Endpoint in the same region = zero transfer charges.

Querying Parquet from EC2 in your bucket's region beats downloading everything locally by a mile. The bigger your data, the bigger the savings.

The downside? Standing up and configuring EC2 every time gets old fast.

Other object storage options

Let's compare alternatives. The key metric is egress—that's what kills your budget.

Note: Ingress (uploading) is always free. With object storage, you pay to get your data back out.

Cloudflare R2

Cloudflare R2: free egress, full stop.

Item Price Free Tier
Storage $0.015/GB/month 10GB/month
Class A operations $4.50/million 1M requests
Class B operations $0.36/million 10M requests
Egress Free Unlimited

If you want to ditch the transfer bill entirely, R2 is your answer.

Reference: Cloudflare R2 Pricing

Backblaze B2

Backblaze B2 keeps egress essentially free too.

Item Price
Storage $6/TB/month ($0.006/GB/month)
Egress Free (up to 3× your storage/month)
Overage Egress $0.01/GB

Store 100GB, download 300GB free per month. Plus it's S3-compatible.

Reference: Backblaze B2 Pricing

Google Cloud Storage

GCS hands out free transfers between services in the same region.

Transfer Type Price
Same zone (private IP) Free
Same region (GCS ↔ GCE, etc.) Free
Different zones (same region) $0.01/GB
Inter-region (e.g., US zones) $0.02/GB
Outbound to internet $0.12/GB

Run DuckDB on a GCE instance in the same region as your data and you pay nothing.

Reference: Google Cloud Storage Pricing

Azure Blob Storage

Azure does the same for intra-region transfers.

Transfer Type Price
Same Availability Zone Free
Inter-region (e.g., US to Canada) $0.02/GB
Outbound to internet $0.087/GB (first 100GB free/month)

Spin up an Azure VM in the same region as your Blob Storage bucket and transfers are free.

Reference: Azure Blob Storage Pricing

Quick comparison

Service Storage Egress Intra-region
AWS S3 $0.023/GB $0.09/GB Free (with Gateway Endpoint)
Cloudflare R2 $0.015/GB Free N/A
Backblaze B2 $0.006/GB Free (3× storage/month) N/A
GCS $0.020/GB $0.12/GB Free
Azure Blob $0.018/GB $0.087/GB Free

For DuckDB queries: Cloudflare R2 or Backblaze B2 eliminate egress entirely.
From a cloud VM: Use that cloud's storage in the same region and pay nothing.

Wrap up

Want zero egress charges with DuckDB? Pick R2 or Backblaze B2—both eliminate them entirely.

Running on a cloud VM? Pick that cloud's object storage, keep it in the same region, and you're fine. Setting up EC2 each time is annoying, but at least the transfer costs disappear.


I build observability tools with DuckDB + object storage.

Top comments (0)