DEV Community

Cover image for DuckDB + Iceberg: The ultimate synergy
Florian Bernard
Florian Bernard

Posted on

DuckDB + Iceberg: The ultimate synergy

Introduction

Apache Iceberg and DuckDB have established themselves as key players in data architecture landscape. With DuckDB 1.4's native support for Iceberg writes, combined with Apache Polaris and MinIO, this promising stack offers efficiency, scalability, and flexibility.

Architecture diagram that show DuckDB + Polaris + MinIO >

Requirements

Setup Polaris + Minio

πŸͺž Clone Apache Polaris repository

git clone https://github.com/apache/polaris.git
Enter fullscreen mode Exit fullscreen mode

πŸ—οΈ Build Polaris Docker image

cd polaris
./gradlew :polaris-server:assemble -Dquarkus.container-image.build=true 
Enter fullscreen mode Exit fullscreen mode

▢️ Start Polaris + MinIO

podman compose -f getting-started/minio/docker-compose.yml up
# you can replace podman with docker
Enter fullscreen mode Exit fullscreen mode

This will create a MinIO Bucket: bucket123 and a Polaris catalog: quickstart_catalog

MinIO user=minio_root password=m1n1opwd
Polaris user=root password=s3cr3t

Let's use DuckDB with Iceberg

πŸ¦† Install DuckDB
curl https://install.duckdb.org | sh

▢️ Start DuckDB client:
duckdb or duckdb -ui

🧊 Install and load Iceberg extension

INSTALL ICEBERG;
LOAD ICEBERG;
Enter fullscreen mode Exit fullscreen mode

πŸ”’ Create a secret to connect to Apache Polaris

CREATE SECRET polaris_secret (
    TYPE iceberg,
    CLIENT_ID 'root',
    CLIENT_SECRET 's3cr3t',
    ENDPOINT 'http://localhost:8181/api/catalog'
);
Enter fullscreen mode Exit fullscreen mode

πŸ”— Attach the Polaris Catalog

ATTACH 'quickstart_catalog' AS polaris_catalog (
    TYPE iceberg,
    ENDPOINT 'http://localhost:8181/api/catalog'
);
Enter fullscreen mode Exit fullscreen mode

πŸ†• Create a new Schema (namespace in Polaris)

create schema polaris_catalog.duckdb;
Enter fullscreen mode Exit fullscreen mode

πŸš– Create a new Iceberg table

create table polaris_catalog.duckdb.taxi as 
select * from read_parquet(
'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-01.parquet'
);
Enter fullscreen mode Exit fullscreen mode

πŸ“Š Query Iceberg table

select * from polaris_catalog.duckdb.taxi limit 10;
Enter fullscreen mode Exit fullscreen mode

DuckDB query result table

πŸ“ Explore created files on MinIO:
Open http://localhost:9001 click on the bucket bucket123 and explore the content of duckdb/taxi folders:

  • data folder contains Iceberg parquet files
  • metadata folder contains Iceberg metadata files

MinIO file explorer

Conclusion

In conclusion, combining Apache Polaris, MinIO and DuckDB (with Iceberg support) offers a powerful, open-source solution for data platform architecture. This stack helps avoid vendor lock-in and ensures high performance, providing the scalability and efficiency required for modern data needs.

Sources

https://polaris.apache.org/in-dev/unreleased/getting-started/minio/
https://duckdb.org/2025/09/16/announcing-duckdb-140.html
https://duckdb.org/docs/stable/core_extensions/iceberg/iceberg_rest_catalogs
https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page

Top comments (0)