I built Iceduck, an open-source Data Lakehouse stack that runs locally with Docker Compose. It combines MinIO, Apache Iceberg (via Polaris), Trino, Postgres, Spark, DuckDB, and Jupyter – all without cloud dependencies or costs.
Iceduck lets you explore open-source tools like Apache Iceberg,DuckDB and Trino on your own machine, making it ideal for:
- Learning Data Lakehouse concepts
- Prototyping data pipelines
- Testing integrations between tools
What’s Inside?
- MinIO for S3-compatible storage
- Apache Polaris as a REST catalog for Iceberg
- DuckDB, Trino, Spark as query engines
- Postgres as a metastore
- Jupyter for interactive exploration
All the details (setup, usage, and configuration) are in the README. I’d love to hear your thoughts:
- Would you use this for learning or testing?
- What’s missing or could be improved?
GitHub: pfabrici/iceduck (Apache 2.0 License)
#dataengineering #opensource #datalakehouse #docker #apacheiceberg
Top comments (0)