DEV Community

Cover image for DuckDB in the Wild: What 6 Minutes of Benchmarking Across 4 Machines Taught Me About Real-World Performance
XIANGWEIXIAO
XIANGWEIXIAO

Posted on

DuckDB in the Wild: What 6 Minutes of Benchmarking Across 4 Machines Taught Me About Real-World Performance

Character Design

I care more about "can we ship this?" than "is this theoretically optimal?"

When I pick data tools, I usually ask three questions:

  • Will it run fast enough on the hardware we actually have?
  • How much does object storage overhead really cost us?
  • Can I estimate these things without building a full test lab?

With that mindset, I ran the same DuckDB workload across four different machines, comparing local disk against S3-compatible object storage. The goal was not to crown a winner. It was to calibrate my intuition for what "fast enough" looks like in practice.


Why I ran these tests

I had two recurring questions while working with DuckDB:

  1. Same SQL, different machine — why does it feel so much slower on some boxes? Is it CPU, disk, or something else entirely?
  2. Local folder vs object storage — where does the extra time actually go?

So I ran the same job on a MacBook, a Ubuntu server, a Windows workstation, and a small cloud instance. I kept the query identical, the data identical, and the DuckDB version identical. Everything else — CPU, memory, filesystem, network — I let vary.

This is not science. It is reconnaissance.


What I held constant (and what I did not)

This is not a lab benchmark. It is an engineering gut-check: same query, different machines, different storage. The numbers are messy, the comparisons are unfair, and yet they are surprisingly useful.

Same across all runs:

  • DuckDB version: v1.5.2 (Variegata)
  • Data scale:
    • Full dataset: ~64.5M rows, ~13.7 GB source, ~3.5 GB output
    • Sample dataset: ~18M rows, ~4.0 GB source, ~1.0 GB output
  • Job semantics: identical ELT pattern (read → transform → write)

Storage setups:

Storage Deployment What this actually tests
Local disk Native filesystem Baseline: no network, no protocol overhead
MinIO Single-node, same machine as DuckDB S3 protocol overhead without network latency
Alibaba Cloud OSS Cloud VM via internal network Cloud CPU + memory + network combined

Important caveat: I did not test "MinIO on a different machine." From experience, cross-machine object storage introduces bandwidth and latency costs that would push the overhead well beyond what I measured here.


Full dataset results: four machines, two storage types

How to read the numbers:

  • Wall time: total elapsed time from start to finish
  • Rough read MB/s: (source GB × 1024) ÷ wall time

This is not raw disk throughput. It includes Parquet parsing, SQL execution, and write operations. Think of it as "end-to-end productivity" rather than hardware specs.

Environment Data source Wall time Rows processed Rough MB/s Source size Output size
macOS Local disk 43s 64.5M ~326 13.7 GB 3.5 GB
Ubuntu Local disk 58s 64.5M ~242 13.7 GB 3.5 GB
Ubuntu MinIO (same machine) 62s 64.5M ~226 13.7 GB 3.5 GB
Windows Local disk 331s 64.5M ~42 13.7 GB 3.5 GB

How I interpret these

macOS vs Ubuntu (both local disk)

The 15-second gap (43s vs 58s) likely comes from CPU microarchitecture and memory bandwidth. The Ubuntu box runs an older Xeon; the MacBook has a newer M-series chip with better vectorization. Same disk type, different compute muscle.

Ubuntu local vs MinIO (same machine)

58s → 62s is about a 7% overhead. That is the cost of S3 protocol parsing, HTTP client work, and MinIO's request handling — without any network latency. If MinIO were on a separate machine, expect significantly more.

Windows local disk (331s)

This number is an outlier, and I treat it as such. Anti-malware scanning, background services, power management policies, and NTFS characteristics can all drag down I/O on Windows. I would not generalize from this single data point without testing in a clean environment first.


Sample dataset results: cloud + OSS vs MacBook

This comparison is intentionally unfair. I am putting it here anyway because unfair comparisons happen in the real world all the time.

Cloud VM specs:

Attribute Value
vCPU 2
Threads 4 (2 cores × hyperthreading)
Memory ~7.3 GB
Disk Cloud SSD (ESSD class)
Network Internal VPC to OSS

Comparison baseline:

MacBook — Apple M5, 10 cores, 32 GB RAM, local NVMe (APFS)

Environment Data source Wall time Rows processed Rough MB/s Source size Output size
Debian (2 vCPU cloud) OSS (internal) 61s 18.0M ~67 4.0 GB 1.0 GB
macOS Local disk 23s 18.0M ~178 4.0 GB 1.0 GB

How I interpret this

The cloud instance is ~2.7× slower — but that is not a story about "cloud is slow." It is a story about 2 vCPU + 7 GB RAM + network versus 10 cores + 32 GB RAM + local NVMe.

I did not profile deeply enough to separate CPU time from I/O wait from network latency. So I cannot tell you which factor matters most. What I can tell you: if someone hands you a 2-core cloud VM and asks for a performance estimate, 2–3× slower than a modern laptop is a reasonable working assumption.


Hardware inventory

For those who want to reproduce or sanity-check:

Machine OS Kernel CPU Cores / Threads RAM Primary storage
MacBook macOS 26.4 Apple M5 10 / 10 32 GB Apple NVMe SSD (APFS)
Workstation Windows 10 Pro Intel i7-10750H 6 / 12 ~64 GB SK Hynix SSD (NTFS)
Server Ubuntu 24.04.3 6.17.0-19 Xeon E5-2698 v3 16 / 32 ~188 GB Samsung NVMe (ext4)
Cloud VM Debian 11 5.10.0-26 Xeon Platinum (virtual) 2 / 4 ~7.3 GB Cloud SSD (ESSD)

Minimal reproduction code

Here are the exact patterns I used, in case you want to run your own gut-check.

From MinIO (S3-compatible, local):

INSTALL httpfs;
LOAD httpfs;

CREATE OR REPLACE SECRET (
    TYPE s3,
    KEY_ID 'your_access_key',
    SECRET 'your_secret_key',
    REGION 'us-east-1',
    ENDPOINT '127.0.0.1:9000',
    URL_STYLE 'path'
);

COPY (
    SELECT *
    FROM read_csv(
        's3://bucket/raw/**/*.csv',
        ignore_errors = true,
        filename = true,
        union_by_name = true,
        sample_size = 200
    )
) TO 's3://bucket/output/result.parquet' (FORMAT PARQUET);
Enter fullscreen mode Exit fullscreen mode

From local filesystem:

COPY (
    SELECT *
    FROM read_csv_auto(
        '/path/to/data/**/*.csv',
        ignore_errors = true,
        filename = true,
        union_by_name = true,
        sample_size = 200
    )
) TO '/path/to/output/result.parquet' (FORMAT PARQUET);
Enter fullscreen mode Exit fullscreen mode

From Alibaba Cloud OSS:

INSTALL httpfs;
LOAD httpfs;

CREATE OR REPLACE SECRET (
    TYPE s3,
    KEY_ID 'your_oss_key',
    SECRET 'your_oss_secret',
    REGION 'cn-hangzhou',
    ENDPOINT 'oss-cn-hangzhou.aliyuncs.com'
);

COPY (
    SELECT *
    FROM read_csv(
        's3://bucket/raw/**/*.csv',
        ignore_errors = true,
        filename = true,
        union_by_name = true,
        sample_size = 200
    )
) TO 's3://bucket/output/result.parquet' (FORMAT PARQUET);
Enter fullscreen mode Exit fullscreen mode

What I actually learned

Three things stick with me:

  1. Same query, same data, 8× wall-time spread — from 43 seconds to 331 seconds. Hardware and environment matter more than I sometimes remember.

  2. Local MinIO adds ~7% overhead when co-located. That is low enough that protocol compatibility is worth it for my use cases. Cross-machine MinIO would be a different story.

  3. Unfair comparisons are still useful — as long as you label them honestly. A 2-core cloud VM being 2–3× slower than a modern laptop is not surprising, but having a concrete number helps with capacity conversations.


What this means for picking your setup

I would not choose hardware based on these numbers alone. But I would use them to:

  • Set expectations: If someone proposes a 2-core cloud instance for heavy batch work, 60+ seconds for 4 GB of data is a realistic starting assumption.
  • Justify local object storage: A same-machine MinIO instance is cheap enough to run and compatible enough to integrate that the 7% tax is usually acceptable.
  • Flag outliers early: That 331-second Windows run tells me "something else is going on here" — whether I investigate further depends on whether that machine actually matters for production.

Top comments (0)