Introduction
Apache Iceberg and DuckDB have established themselves as key players in data architecture landscape. With DuckDB 1.4's native support for Iceberg writes, combined with Apache Polaris and MinIO, this promising stack offers efficiency, scalability, and flexibility.
Requirements
Setup Polaris + Minio
πͺ Clone Apache Polaris repository
git clone https://github.com/apache/polaris.git
ποΈ Build Polaris Docker image
cd polaris
./gradlew :polaris-server:assemble -Dquarkus.container-image.build=true
βΆοΈ Start Polaris + MinIO
podman compose -f getting-started/minio/docker-compose.yml up
# you can replace podman with docker
This will create a MinIO Bucket: bucket123
and a Polaris catalog: quickstart_catalog
MinIO user=minio_root password=m1n1opwd
Polaris user=root password=s3cr3t
Let's use DuckDB with Iceberg
π¦ Install DuckDB
curl https://install.duckdb.org | sh
βΆοΈ Start DuckDB client:
duckdb
or duckdb -ui
π§ Install and load Iceberg extension
INSTALL ICEBERG;
LOAD ICEBERG;
π Create a secret to connect to Apache Polaris
CREATE SECRET polaris_secret (
TYPE iceberg,
CLIENT_ID 'root',
CLIENT_SECRET 's3cr3t',
ENDPOINT 'http://localhost:8181/api/catalog'
);
π Attach the Polaris Catalog
ATTACH 'quickstart_catalog' AS polaris_catalog (
TYPE iceberg,
ENDPOINT 'http://localhost:8181/api/catalog'
);
π Create a new Schema (namespace in Polaris)
create schema polaris_catalog.duckdb;
π Create a new Iceberg table
create table polaris_catalog.duckdb.taxi as
select * from read_parquet(
'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-01.parquet'
);
π Query Iceberg table
select * from polaris_catalog.duckdb.taxi limit 10;
π Explore created files on MinIO:
Open http://localhost:9001 click on the bucket bucket123
and explore the content of duckdb/taxi
folders:
- data folder contains Iceberg parquet files
- metadata folder contains Iceberg metadata files
Conclusion
In conclusion, combining Apache Polaris, MinIO and DuckDB (with Iceberg support) offers a powerful, open-source solution for data platform architecture. This stack helps avoid vendor lock-in and ensures high performance, providing the scalability and efficiency required for modern data needs.
Sources
https://polaris.apache.org/in-dev/unreleased/getting-started/minio/
https://duckdb.org/2025/09/16/announcing-duckdb-140.html
https://duckdb.org/docs/stable/core_extensions/iceberg/iceberg_rest_catalogs
https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Top comments (0)