DEV Community

Peter Fabricius
Peter Fabricius

Posted on

Iceduck: A Local Data Lakehouse Stack for Learning (No Cloud Needed)

I built Iceduck, an open-source Data Lakehouse stack that runs locally with Docker Compose. It combines MinIO, Apache Iceberg (via Polaris), Trino, Postgres, Spark, DuckDB, and Jupyter – all without cloud dependencies or costs.

Iceduck lets you explore open-source tools like Apache Iceberg,DuckDB and Trino on your own machine, making it ideal for:

  • Learning Data Lakehouse concepts
  • Prototyping data pipelines
  • Testing integrations between tools

What’s Inside?

  • MinIO for S3-compatible storage
  • Apache Polaris as a REST catalog for Iceberg
  • DuckDB, Trino, Spark as query engines
  • Postgres as a metastore
  • Jupyter for interactive exploration

All the details (setup, usage, and configuration) are in the README. I’d love to hear your thoughts:

  • Would you use this for learning or testing?
  • What’s missing or could be improved?

GitHub: pfabrici/iceduck (Apache 2.0 License)

#dataengineering #opensource #datalakehouse #docker #apacheiceberg

Top comments (0)