Iceduck: A Local Data Lakehouse Stack for Learning (No Cloud Needed)

#dataengineering #learning #opensource #showdev

I built Iceduck, an open-source Data Lakehouse stack that runs locally with Docker Compose. It combines MinIO, Apache Iceberg (via Polaris), Trino, Postgres, Spark, DuckDB, and Jupyter – all without cloud dependencies or costs.

Iceduck lets you explore open-source tools like Apache Iceberg,DuckDB and Trino on your own machine, making it ideal for:

Learning Data Lakehouse concepts
Prototyping data pipelines
Testing integrations between tools

What’s Inside?

MinIO for S3-compatible storage
Apache Polaris as a REST catalog for Iceberg
DuckDB, Trino, Spark as query engines
Postgres as a metastore
Jupyter for interactive exploration

All the details (setup, usage, and configuration) are in the README. I’d love to hear your thoughts:

Would you use this for learning or testing?
What’s missing or could be improved?

GitHub: pfabrici/iceduck (Apache 2.0 License)

#dataengineering #opensource #datalakehouse #docker #apacheiceberg

DEV Community

Iceduck: A Local Data Lakehouse Stack for Learning (No Cloud Needed)

What’s Inside?

Top comments (0)