TL;DR
Use Cloudflare R2, or run DuckDB on EC2 in the same region as your S3 bucket with Gateway Endpoint enabled.
A quick note
Stick with Parquet.
How much data actually gets transferred?
When you query a Parquet file on S3 through DuckDB, the whole file doesn't get downloaded. Instead, DuckDB uses HTTP Range Requests to grab only the bytes it needs.
The mechanics
DuckDB fetches data in two passes:
- Metadata: Range-request just the metadata section of the Parquet file
- Data: Range-request only the columns and row groups needed by your query
"DuckDB always uses range requests, firstly to query the metadata only, then to fetch the required columns."
— PR #5405: HTTP parquet optimizations
A concrete example
SELECT column_a FROM 's3://bucket/file.parquet';
DuckDB downloads only the bytes containing column_a. So even with a 10GB file, if column_a is just 100MB, you only transfer ~100MB.
Even better—sometimes you don't transfer anything:
SELECT count(*) FROM 's3://bucket/file.parquet';
Parquet metadata includes row counts, so DuckDB can return your result without reading any data at all.
Reference: DuckDB Official Documentation - HTTP(S) Support
Filter and projection pushdown
DuckDB's S3 reader can push filters and projections down to the storage layer, so even less data gets touched.
"We're able to do partial reads via Range requests actually, so it should be fairly efficient."
— Discussion #4559
So why do S3 bills get so nasty?
Here's the catch: intra-region EC2-to-S3 transfers are free.
Yet somehow people end up with shocking bills. What's going on?
1. No S3 Gateway Endpoint
Without a Gateway Endpoint, traffic from your VPC to S3 gets routed through NAT Gateway or the internet gateway.
- Via NAT Gateway: You pay $0.045/GB
- Via Gateway Endpoint: Free
"There is no additional charge for using gateway endpoints."
— AWS Official Documentation - Gateway endpoints for Amazon S3
2. Accessing across regions
If your S3 bucket and EC2 are in different regions, AWS charges you $0.01–$0.02/GB for the privilege.
3. Going out to the internet
Querying from your laptop or anything outside AWS? You pay $0.09/GB and up for internet egress.
The fix
EC2 + S3 Gateway Endpoint in the same region = zero transfer charges.
Querying Parquet from EC2 in your bucket's region beats downloading everything locally by a mile. The bigger your data, the bigger the savings.
The downside? Standing up and configuring EC2 every time gets old fast.
Other object storage options
Let's compare alternatives. The key metric is egress—that's what kills your budget.
Note: Ingress (uploading) is always free. With object storage, you pay to get your data back out.
- AWS S3 Pricing - "Data Transfer IN To Amazon S3 From Internet: $0.00 per GB"
- GCS Pricing - "Network ingress: Free"
- Azure Bandwidth Pricing - "Data Transfer In: Free"
Cloudflare R2
Cloudflare R2: free egress, full stop.
| Item | Price | Free Tier |
|---|---|---|
| Storage | $0.015/GB/month | 10GB/month |
| Class A operations | $4.50/million | 1M requests |
| Class B operations | $0.36/million | 10M requests |
| Egress | Free | Unlimited |
If you want to ditch the transfer bill entirely, R2 is your answer.
Reference: Cloudflare R2 Pricing
Backblaze B2
Backblaze B2 keeps egress essentially free too.
| Item | Price |
|---|---|
| Storage | $6/TB/month ($0.006/GB/month) |
| Egress | Free (up to 3× your storage/month) |
| Overage Egress | $0.01/GB |
Store 100GB, download 300GB free per month. Plus it's S3-compatible.
Reference: Backblaze B2 Pricing
Google Cloud Storage
GCS hands out free transfers between services in the same region.
| Transfer Type | Price |
|---|---|
| Same zone (private IP) | Free |
| Same region (GCS ↔ GCE, etc.) | Free |
| Different zones (same region) | $0.01/GB |
| Inter-region (e.g., US zones) | $0.02/GB |
| Outbound to internet | $0.12/GB |
Run DuckDB on a GCE instance in the same region as your data and you pay nothing.
Reference: Google Cloud Storage Pricing
Azure Blob Storage
Azure does the same for intra-region transfers.
| Transfer Type | Price |
|---|---|
| Same Availability Zone | Free |
| Inter-region (e.g., US to Canada) | $0.02/GB |
| Outbound to internet | $0.087/GB (first 100GB free/month) |
Spin up an Azure VM in the same region as your Blob Storage bucket and transfers are free.
Reference: Azure Blob Storage Pricing
Quick comparison
| Service | Storage | Egress | Intra-region |
|---|---|---|---|
| AWS S3 | $0.023/GB | $0.09/GB | Free (with Gateway Endpoint) |
| Cloudflare R2 | $0.015/GB | Free | N/A |
| Backblaze B2 | $0.006/GB | Free (3× storage/month) | N/A |
| GCS | $0.020/GB | $0.12/GB | Free |
| Azure Blob | $0.018/GB | $0.087/GB | Free |
For DuckDB queries: Cloudflare R2 or Backblaze B2 eliminate egress entirely.
From a cloud VM: Use that cloud's storage in the same region and pay nothing.
Wrap up
Want zero egress charges with DuckDB? Pick R2 or Backblaze B2—both eliminate them entirely.
Running on a cloud VM? Pick that cloud's object storage, keep it in the same region, and you're fine. Setting up EC2 each time is annoying, but at least the transfer costs disappear.
I build observability tools with DuckDB + object storage.
Top comments (0)