Character Design
I care more about "can we ship this?" than "is this theoretically optimal?"
When I pick data tools, I usually ask three questions:
- Will it run fast enough on the hardware we actually have?
- How much does object storage overhead really cost us?
- Can I estimate these things without building a full test lab?
With that mindset, I ran the same DuckDB workload across four different machines, comparing local disk against S3-compatible object storage. The goal was not to crown a winner. It was to calibrate my intuition for what "fast enough" looks like in practice.
Why I ran these tests
I had two recurring questions while working with DuckDB:
- Same SQL, different machine — why does it feel so much slower on some boxes? Is it CPU, disk, or something else entirely?
- Local folder vs object storage — where does the extra time actually go?
So I ran the same job on a MacBook, a Ubuntu server, a Windows workstation, and a small cloud instance. I kept the query identical, the data identical, and the DuckDB version identical. Everything else — CPU, memory, filesystem, network — I let vary.
This is not science. It is reconnaissance.
What I held constant (and what I did not)
This is not a lab benchmark. It is an engineering gut-check: same query, different machines, different storage. The numbers are messy, the comparisons are unfair, and yet they are surprisingly useful.
Same across all runs:
- DuckDB version: v1.5.2 (Variegata)
- Data scale:
- Full dataset: ~64.5M rows, ~13.7 GB source, ~3.5 GB output
- Sample dataset: ~18M rows, ~4.0 GB source, ~1.0 GB output
- Job semantics: identical ELT pattern (read → transform → write)
Storage setups:
| Storage | Deployment | What this actually tests |
|---|---|---|
| Local disk | Native filesystem | Baseline: no network, no protocol overhead |
| MinIO | Single-node, same machine as DuckDB | S3 protocol overhead without network latency |
| Alibaba Cloud OSS | Cloud VM via internal network | Cloud CPU + memory + network combined |
Important caveat: I did not test "MinIO on a different machine." From experience, cross-machine object storage introduces bandwidth and latency costs that would push the overhead well beyond what I measured here.
Full dataset results: four machines, two storage types
How to read the numbers:
- Wall time: total elapsed time from start to finish
- Rough read MB/s: (source GB × 1024) ÷ wall time
This is not raw disk throughput. It includes Parquet parsing, SQL execution, and write operations. Think of it as "end-to-end productivity" rather than hardware specs.
| Environment | Data source | Wall time | Rows processed | Rough MB/s | Source size | Output size |
|---|---|---|---|---|---|---|
| macOS | Local disk | 43s | 64.5M | ~326 | 13.7 GB | 3.5 GB |
| Ubuntu | Local disk | 58s | 64.5M | ~242 | 13.7 GB | 3.5 GB |
| Ubuntu | MinIO (same machine) | 62s | 64.5M | ~226 | 13.7 GB | 3.5 GB |
| Windows | Local disk | 331s | 64.5M | ~42 | 13.7 GB | 3.5 GB |
How I interpret these
macOS vs Ubuntu (both local disk)
The 15-second gap (43s vs 58s) likely comes from CPU microarchitecture and memory bandwidth. The Ubuntu box runs an older Xeon; the MacBook has a newer M-series chip with better vectorization. Same disk type, different compute muscle.
Ubuntu local vs MinIO (same machine)
58s → 62s is about a 7% overhead. That is the cost of S3 protocol parsing, HTTP client work, and MinIO's request handling — without any network latency. If MinIO were on a separate machine, expect significantly more.
Windows local disk (331s)
This number is an outlier, and I treat it as such. Anti-malware scanning, background services, power management policies, and NTFS characteristics can all drag down I/O on Windows. I would not generalize from this single data point without testing in a clean environment first.
Sample dataset results: cloud + OSS vs MacBook
This comparison is intentionally unfair. I am putting it here anyway because unfair comparisons happen in the real world all the time.
Cloud VM specs:
| Attribute | Value |
|---|---|
| vCPU | 2 |
| Threads | 4 (2 cores × hyperthreading) |
| Memory | ~7.3 GB |
| Disk | Cloud SSD (ESSD class) |
| Network | Internal VPC to OSS |
Comparison baseline:
MacBook — Apple M5, 10 cores, 32 GB RAM, local NVMe (APFS)
| Environment | Data source | Wall time | Rows processed | Rough MB/s | Source size | Output size |
|---|---|---|---|---|---|---|
| Debian (2 vCPU cloud) | OSS (internal) | 61s | 18.0M | ~67 | 4.0 GB | 1.0 GB |
| macOS | Local disk | 23s | 18.0M | ~178 | 4.0 GB | 1.0 GB |
How I interpret this
The cloud instance is ~2.7× slower — but that is not a story about "cloud is slow." It is a story about 2 vCPU + 7 GB RAM + network versus 10 cores + 32 GB RAM + local NVMe.
I did not profile deeply enough to separate CPU time from I/O wait from network latency. So I cannot tell you which factor matters most. What I can tell you: if someone hands you a 2-core cloud VM and asks for a performance estimate, 2–3× slower than a modern laptop is a reasonable working assumption.
Hardware inventory
For those who want to reproduce or sanity-check:
| Machine | OS | Kernel | CPU | Cores / Threads | RAM | Primary storage |
|---|---|---|---|---|---|---|
| MacBook | macOS 26.4 | — | Apple M5 | 10 / 10 | 32 GB | Apple NVMe SSD (APFS) |
| Workstation | Windows 10 Pro | — | Intel i7-10750H | 6 / 12 | ~64 GB | SK Hynix SSD (NTFS) |
| Server | Ubuntu 24.04.3 | 6.17.0-19 | Xeon E5-2698 v3 | 16 / 32 | ~188 GB | Samsung NVMe (ext4) |
| Cloud VM | Debian 11 | 5.10.0-26 | Xeon Platinum (virtual) | 2 / 4 | ~7.3 GB | Cloud SSD (ESSD) |
Minimal reproduction code
Here are the exact patterns I used, in case you want to run your own gut-check.
From MinIO (S3-compatible, local):
INSTALL httpfs;
LOAD httpfs;
CREATE OR REPLACE SECRET (
TYPE s3,
KEY_ID 'your_access_key',
SECRET 'your_secret_key',
REGION 'us-east-1',
ENDPOINT '127.0.0.1:9000',
URL_STYLE 'path'
);
COPY (
SELECT *
FROM read_csv(
's3://bucket/raw/**/*.csv',
ignore_errors = true,
filename = true,
union_by_name = true,
sample_size = 200
)
) TO 's3://bucket/output/result.parquet' (FORMAT PARQUET);
From local filesystem:
COPY (
SELECT *
FROM read_csv_auto(
'/path/to/data/**/*.csv',
ignore_errors = true,
filename = true,
union_by_name = true,
sample_size = 200
)
) TO '/path/to/output/result.parquet' (FORMAT PARQUET);
From Alibaba Cloud OSS:
INSTALL httpfs;
LOAD httpfs;
CREATE OR REPLACE SECRET (
TYPE s3,
KEY_ID 'your_oss_key',
SECRET 'your_oss_secret',
REGION 'cn-hangzhou',
ENDPOINT 'oss-cn-hangzhou.aliyuncs.com'
);
COPY (
SELECT *
FROM read_csv(
's3://bucket/raw/**/*.csv',
ignore_errors = true,
filename = true,
union_by_name = true,
sample_size = 200
)
) TO 's3://bucket/output/result.parquet' (FORMAT PARQUET);
What I actually learned
Three things stick with me:
Same query, same data, 8× wall-time spread — from 43 seconds to 331 seconds. Hardware and environment matter more than I sometimes remember.
Local MinIO adds ~7% overhead when co-located. That is low enough that protocol compatibility is worth it for my use cases. Cross-machine MinIO would be a different story.
Unfair comparisons are still useful — as long as you label them honestly. A 2-core cloud VM being 2–3× slower than a modern laptop is not surprising, but having a concrete number helps with capacity conversations.
What this means for picking your setup
I would not choose hardware based on these numbers alone. But I would use them to:
- Set expectations: If someone proposes a 2-core cloud instance for heavy batch work, 60+ seconds for 4 GB of data is a realistic starting assumption.
- Justify local object storage: A same-machine MinIO instance is cheap enough to run and compatible enough to integrate that the 7% tax is usually acceptable.
- Flag outliers early: That 331-second Windows run tells me "something else is going on here" — whether I investigate further depends on whether that machine actually matters for production.
Top comments (0)